1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Inference for Regression
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test.
EPI 809/Spring Probability Distribution of Random Error.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
ANOVA notes NR 245 Austin Troy
Chapter 10 Simple Regression.
Part I – MULTIVARIATE ANALYSIS
Chapter 12 Multiple Regression
ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Analysis of Variance Chapter 3Design & Analysis of Experiments 7E 2009 Montgomery 1.
Chapter 11 Multiple Regression.
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Introduction to Probability and Statistics Linear Regression and Correlation.
Inferences About Process Quality
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Analysis of Variance & Multivariate Analysis of Variance
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Chapter 12: Analysis of Variance
F-Test ( ANOVA ) & Two-Way ANOVA
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
Introduction to Linear Regression and Correlation Analysis
Chapter 13: Inference in Regression
Regression Analysis (2)
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
5-5 Inference on the Ratio of Variances of Two Normal Populations The F Distribution We wish to test the hypotheses: The development of a test procedure.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Chapter 13 Multiple Regression
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Simple linear regression Tron Anders Moger
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Experimental Statistics - week 3
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Analysis of Variance STAT E-150 Statistical Methods.
Experimental Statistics - week 9
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Statistics for Managers using Microsoft Excel 3rd Edition
Essentials of Modern Business Statistics (7e)
Slides by JOHN LOUCKS St. Edward’s University.
CHAPTER 29: Multiple Regression*
Prepared by Lee Revere and John Large
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

1

1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang Way ANOVA derivation - Yingying Lin and Wenyi Dong Way ANOVA in SAS - Yingying Lin and Wenyi Dong Way ANOVA derivation - Peng Yang Way ANOVA in SAS - Phil Caffrey and Yin Diao 9 - Multi-Way ANOVA Derivation - Michael Biro 10 - ANOVA and Regression – Cris (Jiangyang) Liu 2

3

USES OF T-TEST A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis. A two sample location test of the null hypothesis that the means of two normally distributed populations are equal 4

USES OF T-TEST A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero A test of whether the slope of a regression line differs significantly from 0 5

BACKGROUND If comparing means among > 2 groups, 3 or more t-tests are needed -Time-consuming (Number of t-tests increases) -Inherently flawed (Probability of making a Type I error increases) 6

RONALD A.FISHER Biologist Eugenicist Geneticist Statistician “A genius who almost single-handedly created the foundations for modern statistical science” - Anders Hald “The greatest of Darwin's successors” -Richard Dawkins  Informally used by researchers in the 1800s  Formally proposed by Ronald A. Fisher in

HISTORY Fisher proposed a formal analysis of variance in his paper The Correlation Between Relatives on the Supposition of Mendelian Inheritance in His first application of the analysis of variance was published in Become widely known after being included in Fisher's 1925 book Statistical Methods for Research Workers in

DEFINITION An abbreviation for: ANalysis Of VAriance The procedure to consider means from k independent groups, where k is 2 or greater. 9

ANOVA and T-TEST ANOVA and T-Test are similar -Compare means between groups 2 groups, both work 2 or more groups, ANOVA is better 10

TYPES ANOVA - analysis of variance – One way (F-ratio for 1 factor ) – Two way (F-ratio for 2 factors) ANCOVA - analysis of covariance MANOVA - multiple analysis 11

APPLICATION Biology Microbiology Medical Science Computer Science Industry Finance 12

13

Definition ANOVA can determine whether there is a significant relationship between variables. It is also used to determine whether a measurable difference exists between two or more sample means. Objective: To identify important independent variables (predictor variables – y i ’s) and determine how they affect the response variables. One-way, two-way, or multi-way ANOVA depend on the number of independent variables there are in the experiment that affect the outcome of the hypothesis test. 14

Model & Assumptions 15

Classes of ANOVA 1.Fixed Effects: concrete (e.g. sex, age) 2.Random Effects: representative sample (e.g. treatments, locations, tests) 3.Mixed Effects: combination of fixed and random 16

Procedure H0: µ 1 =µ 2 =…=µ k vs Ha: at least one the equalities doesn’t hold F~f k,n-(k+1),α = MSR/MSE = t 2 (when there are only 2 means) – Where mean square regression: MSR = SSR/1 and mean square error: MSE = SSE/n-2 The rejection region for a given significance level is F > f 17

Regression SST (sum of squares total) = SSR (sum of squares regression) + SSE (sum of squares error) Sample variance: S 2 = MSE = SSE/n-k → Unbiased estimator for σ 2 18

Mean Variation 19

20

Data Collection 3 industries – Application Software, Credit Service, Apparel Stores Sample 15 stocks from each industry For each stock, we observed the last 30 days and calculated – Mean daily percentage change – Mean daily percentage range – Mean Volume 21

Application software CA, Inc. [CA] CA, Inc.CA Compuware Corporation [CPWR] Compuware CorporationCPWR Deltek, Inc. [PROJ] Deltek, Inc.PROJ Epicor Software Corporation [EPIC] Epicor Software CorporationEPIC Fundtech Ltd. [FNDT] Fundtech Ltd.FNDT Intuit Inc. [INTU] Intuit Inc.INTU Lawson Software, Inc. [LWSN] Lawson Software, Inc.LWSN Microsoft Corporation [MSFT Microsoft CorporationMSFT MGT Capital Investments, Inc. [MGT] MGT Capital Investments, Inc.MGT Magic Software Enterprises Ltd. [MGIC] Magic Software Enterprises Ltd.MGIC SAP AG [SAP] SAP AGSAP Sonic Foundry, Inc. [SOFO] Sonic Foundry, Inc.SOFO RealPage, Inc. [RP] RealPage, Inc.RP Red Hat, Inc. [RHT] Red Hat, Inc.RHT VeriSign, Inc. [VRSN] VeriSign, Inc.VRSN 22

Credit Service Advance America, Cash Advance Centers, Inc. [AEA] Advance America, Cash Advance Centers, Inc.AEA Alliance Data Systems Corporation [ADS] Alliance Data Systems CorporationADS American Express Company [AXP] American Express CompanyAXP Asset Acceptance Capital Corp. [AACC] Asset Acceptance Capital Corp.AACC Capital One Financial Corporation [COF] Capital One Financial CorporationCOF CapitalSource Inc. [CSE] CapitalSource Inc.CSE Cash America International, Inc. [CSH] Cash America International, Inc.CSH Discover Financial Services [DFS] Discover Financial ServicesDFS Equifax Inc. [EFX] Equifax Inc.EFX Global Cash Access Holdings, Inc. [GCA] Global Cash Access Holdings, Inc.GCA Federal Agricultural Mortgage Corporation [AGM] Federal Agricultural Mortgage CorporationAGM Intervest Bancshares Corporation [IBCA] Intervest Bancshares CorporationIBCA Manhattan Bridge Capital, Inc. [LOAN] Manhattan Bridge Capital, Inc.LOAN MicroFinancial Incorporated [MFI] MicroFinancial IncorporatedMFI Moody's Corporation [MCO] Moody's CorporationMCO 23

APPAREL STORES Abercrombie & Fitch Co. [ANF] Abercrombie & Fitch Co.ANF American Eagle Outfitters, Inc. [AEO] American Eagle Outfitters, Inc.AEO bebe stores, inc. [BEBE] bebe stores, inc.BEBE DSW Inc. [DSW] DSW Inc.DSW Express, Inc. [EXPR] Express, Inc.EXPR J. Crew Group, Inc. [JCG] J. Crew Group, Inc.JCG New York & Company, Inc. [NWY] New York & Company, Inc.NWY Nordstrom, Inc. [JWN] Nordstrom, Inc.JWN Pacific Sunwear of California, Inc. [PSUN] Pacific Sunwear of California, Inc.PSUN The Gap, Inc. [GPS] The Gap, Inc.GPS The Buckle, Inc. [BKE] The Buckle, Inc.BKE The Children's Place Retail Stores, Inc. [PLCE] The Children's Place Retail Stores, Inc.PLCE The Dress Barn, Inc. [DBRN] The Dress Barn, Inc.DBRN The Finish Line, Inc. [FINL] The Finish Line, Inc.FINL Urban Outfitters, Inc. [URBN] Urban Outfitters, Inc.URBN 24

25

26

Final Data look 27

28

Major Assumptions of Analysis of Variance The Assumptions – Normal populations – Independent samples – Equal (unknown) population variances Our Purpose – Examine these assumptions by graphical analysis of residual 29

Residual plot Violations of the basic assumptions and model adequacy can be easily investigated by the examination of residuals. We define the residual for observation j in treatment i as If the model is adequate, the residuals should be structureless; that is, they should contain no obvious patterns. 30

Normality Why normal? – ANOVA is an Analysis of Variance – Analysis of two variances, more specifically, the ratio of two variances – Statistical inference is based on the F distribution which is given by the ratio of two chi-squared distributions – No surprise that each variance in the ANOVA ratio come from a parent normal distribution Normality is only needed for statistical inference. 31

Sas code for getting residual PROC IMPORT datafile = 'C:\Users\junyzhang\Desktop\mydata.xls' out = stock; RUN; PROC PRINT DATA=stock; RUN; Proc glm data=stock; Class indu; Model adpcdata=indu; Output out =stock1 p=yhat r=resid; Run; PROC PRINT DATA=stock1; RUN; 32

Normality test The normal plot of the residuals is used to check the normality test. proc univariate data= stock1 normal plot; var resid; run; 33

Normality Tests Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W < Kolmogorov-Smirnov D Pr > D < Cramer-von Mises W-Sq Pr > W-Sq < Anderson-Darling A-Sq Pr > A-Sq < Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D > Cramer-von Mises W-Sq Pr > W-Sq > Anderson-Darling A-Sq Pr > A-Sq > Normal Probability Plot * | ++* | +** | **** | *** | **+ | ** | *** | **+ | *** 0.1+ *** | ** | *** | ** | +*** | +** | **** | ++ | +* -2.1+* Normal Probability Plot | * | | | | ** ++++ | ** +++ | *+++ | +++* | ++**** | ++++ ** | ++++***** | ++****** 0.25+* * ******************

Normality Tests 35

Independence Independent observations – No correlation between error terms – No correlation between independent variables and error Positively correlated data inflates standard error – The estimation of the treatment means are more accurate than the standard error shows. 36

SAS code for independence test The plot of the residual against the factor is used to check the independence. proc plot; plot resid* indu; run; 37

Independence Tests 38

Homogeneity of Variances Eisenhart (1947) describes the problem of unequal variances as follows – the ANOVA model is based on the proportion of the mean squares of the factors and the residual mean squares – The residual mean square is the unbiased estimator of  2, the variance of a single observation – The between treatment mean squares takes into account not only the differences between observations,  2, just like the residual mean squares, but also the variance between treatments – If there was non-constant variance among treatments, we can replace the residual mean square with some overall variance,  a 2, and a treatment variance,  t 2, which is some weighted version of  a 2 – The “neatness” of ANOVA is lost 39

Sas code for Homogeneity of Variances test The plot of residuals against the fitted value is used to check constant variance assumption. proc plot; plot resid* yhat; run; 40

Data with homogeneity of Variances 41

Tests for Homogeneity of Variances 42

Result about our data – Normal populations – Nearly independent samples – Equal (unknown) population variances So we can employ ANOVA to analyze our data. 43

44

Derivation – 1-Way ANOVA Hypotheses – H 0 : μ= μ 1 = μ 2 = μ 3 = … = μ n – H 1 : μ i ≠ μ j for some i,j We assume that the j th observation in group i is related to the mean by x ij = μ+ (μ i – μ) + ε ij, where ε ij is a random noise term. We wish to separate the variability of the individual observations into parts due to differences between groups and individual variability 45

Derivation – 1-Way ANOVA – Cont’ 46

Derivation – 1-Way ANOVA – Cont’ We can show that Using the above equation, we define 47

Derivation – 1-Way ANOVA – Cont’ Given the distributions of the MSS values, we can reject the null hypothesis if the between group variance is significantly higher than the within group variance. That is, We reject the null hypothesis if F > f n-1,N-n,α 48

Brief Summary Statistics Code proc means data=stock maxdec=5 n mean std; by industry; var ADPC; Get simple summary statistics(sample size, sample mean and SD of each industry) with max of 5 decimal places 49

Brief Summary Statistics Output IndustryNMeanStd Dev Apparel Stores Application Software Credit Service

Data Plot Code proc plot data=stock; plot industry*ADPC; Produce crude graphical output 51

Data Plot Output Plot of industry*ADPC. Legend: A = 1 obs, B = 2 obs, D = 4 obs. industry | CreditSe + A A B A AAA AABA A A Applicat + A D A AAAAA A A A A ApparelS + AA B A B B B A BA | ADPC 52

One Way ANOVA Test Code proc anova data=stock; class industry; model ADPC=industry; means industry/tukey cldiff; means industry/tukey lines; Class statement indicates that “industry” is a factor. Assumes”industry”influences average daily percentage change. Multiple comparison by Tukey’s method—get actual Confidence Intervals. Get pictorial display of comparisons. 53

GLM analysis Code proc glm data=stock; class industry; model ADPC=industry; output out=stockfit p=yhat r=resid; This procedure is similar to 'proc anova' but 'glm' allows residual plots but gives more junk output. 54

One Way ANOVA Test Output Dependent Variable: ADPC Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE ADPC Mean Source DF Anova SS Mean Square F Value Pr > F industry

One Way ANOVA Test Tukey's Studentized Range (HSD) Test for ADPC Alpha Error Degrees of Freedom Error Mean Square Critical Value of Studentized Range Minimum Significant Difference

One Way ANOVA Test Difference Industry Between Simultaneous 95% Comparison Means Confidence Limits Applicat - ApparelS Applicat - CreditSe ApparelS - Applicat ApparelS - CreditSe CreditSe - Applicat CreditSe - ApparelS

Univariate Procedure Code proc univariate data=stockfit plot normal; var resid; We use the proc univariate to produce the stem-and-leaf and normal probability plots and we use the stem- leaf plot to visualize the overall distribution of a variable. 58

Univariate Procedure Output Moments N 45 Sum Weights 45 Mean 0 Sum Observations 0 Std Deviation Variance Skewness Kurtosis UncorrectedSS Corrected SS Coeff Variation. Std Error Mean

Tests for Location: Mu0=0 Test -Statistic p Value Student's t t 0 Pr > |t| Sign M -1.5 Pr >= |M| Signed Rank S Pr >= |S|

Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode. Range Interquartile Range

Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D > Cramer-von Mises W-Sq Pr > W-Sq Anderson-Darling A-Sq Pr > A-Sq

Quantiles Quantile Estimate 100% Max % % % % Q % Median % Q % % % % Min

Extreme Observations Lowest Highest Value Obs Value Obs

Stem Leaf Plot and Boxplot Stem Leaf # Boxplot * | | | | + | *-----* | -6 | | -10 | | Multiply Stem.Leaf by 10**-3 65

Plot Code proc plot; plot resid*industry; plot resid*yhat; run; Plot the qq graph of residual VS industry, and residual VS the approximated ADPC value. 66

Normal Probability Plot * | | | | +++ | ++++ | ++* | ++++* | ++***** | +***** | +**** | ***** | ****** | * ******+ | ++++ | *++ | *

Graph | A | A B | A A | A C | B A B | A C B | A B | A B A | A B | B A A B D | A | A | industry ApparelS Applicat CreditSe Plot of resid*industry. Legend: A = 1 obs B = 2 obs D = 4 obs 68

Plot of resid*yhat resid | A | A B | A A | C A | B B A | A B C | A B | A A B | B A | A B A B D | A yhat Plot of resid*yhat. Legend: A = 1 obs, B = 2 obs, D=4 obs. 69

Conclusion After the analysis of one way anova test,we can get the result of F=1.00 and p= Since the p-value is bigger, we accept the null hypothesis which indicates that there is no difference between the mean of daily average percentage change of stocks of different industries. Thus, there is no different if we buy the stocks in different industries in the long term. 70

71

We now have two factors (A & B) 72

Linear ModelDot Notation 73

Least Square Method SST = SSA + SSB+ SSAB + SSE SST = SSA + SSB + SSAB + SSE 74

Test CriteriaRejection Conditions 75

Pivotal Quantity 76

Pivotal Quantity (Cont’) 77

Two-Way ANOVA in SAS By: Philip Caffrey & Yin Diao 78

Model An extension of one way ANOVA. It provides more insight about how the two IVs interact and individually affect the DV. Thus, the main effects and interaction effects of two IVs have on the DV need to be tested. Model: Null hypothesis: 79

Sum of Squares Every term compared with the error term leads to F distribution. In this way, we can conclude whether there is main effect or interaction effect. SS TOTAL = SS A + SS B + SS INTERACTION + SS ERROR 80

Example Using the same data from the One-Way analysis, we will now separate the data further by introducing a second factor, Average Daily Volume. 81

Example Factor 1: Industry Apparrel Stores Application Software Credit Services Factor 2: Average Daily Volume Low Medium High 82

Two-Way Design INDUSTRY CreditApparelSoftware VOLUMEVOLUME Low Medium High Repeat 5 times each 83

Using SAS SAS code: PROC IMPORT DATAFILE=PROC IMPORT DATAFILE='G:\Stony Brok Univ Text Books\AMS Project\Data.xls' OUT=TWOWAY; RUN; PROC ANOVA DATA = TWOWAY; TITLE “ANALYSIS OF STOCK DATA”; CLASS INDUSTRY VOLUME; MODEL ADPC = INDUSTRY | VOLUME; MEANS INDUSTRY | VOLUME / TUKEY CLDIFF; RUN; 84

/*PLOT THE CELL MEANS*/ PROC MEANS DATA=WAY NWAY NOPRINT; CLASS INDT ADTV; VAR ADPC; OUTPUT OUT=MEANS MEAN=; RUN; PROC GPLOT DATA=MEANS; PLOT INDT*ADTV; RUN; 85 Using SAS

ANOVA Table No Sig. Results 86

To test the main effect of one IV, we should combine all the data of the other IV. And this is done in the one way ANOVA. From the ANOVA we know there is no significant main effects or interaction effect of the two IVs. To indicate if there is an interaction effect, we can plot of means of each cell formed by combination of all levels of IVs. 87 Using SAS

PLOT OF CELL MEANS Industry by Average Daily Volume 88

Interpreting the Output Given that the F tests were not significant we would normally stop our analysis here. If the F test is significant, we would want to know exactly which means are different from each other. Use Tukey’s Test. MEANS INDUSTRY | VOLUME / TUKEY CLDIFF; 89

Interpreting the Output Comparing Means Comparison Diff. b/w Means 95% CI Software - Apparel [ ] Software - Credit [ ] Credit - Apparel [ ] MedVol. - LowVol [ ] Med.Vol. - HighVol [ ] HighVol. - LowVol [ ] 90

Conclusion We cannot conclude that there is a significant difference between any of the group means. The two IVs have no effects on the DV. 91

92

M-way ANOVA (Derivation) Let us have n factors, A 1,A 2,…,A n, each with 2 or more levels, a 1,a 2,…,a n, respectively. Then there are N = a 1 a 2 …a n types of treatment to conduct, with each treatment having sample size n i. Let x i 1 i 2 …i n k be the k th observation from treatment i 1 i 2 …i n. By the assumption for ANOVA, x i 1 i 2 …i n k is a random variable that follows the normal distribution. Using the model x i 1 i 2 …i n k = µ i 1 i 2 …i n k + ε i 1 i 2 …i n k where each (residual) ε i 1 i 2 …i n k are i.i.d. and follows N(0,σ 2 ). 93

M-way ANOVA (Derivation) 94

M-way ANOVA (Derivation) 95

M-way ANOVA (Derivation) These are all distributed as independent χ 2 random variables (when multiplied by the correct constants and when some hypotheses hold) with d.f. satisfying the equation: 96

M-way ANOVA (Derivation) There are a total of 2 m hypotheses in an m- way ANOVA. – The null hypothesis, which states that there is no difference or interaction between factors – For k from 1 to m, there are mCk alternative hypotheses about the interaction between every collection of k factors. – Then we have 1 + mC1 + mC2 + … + mCm = 2 m by a well known combinatorial identity. 97

M-way ANOVA (Derivation) These hypotheses are: 98

M-way ANOVA (Derivation) We want to see if the variability between groups is larger that the variability within the groups. To do this, we use the F distribution as our pivotal quantity, and then we can derive the proper tests, very similar to the 1-way and 2- way tests. 99

M-way ANOVA (Derivation) 100

ANOVA and Regression Presenter: Cris J.Y. Liu R ELATIONSHIP BETWEEN 101

What we know: – regression is the statistical model that you use to predict a continuous outcome on the basis of one or more continuous predictor variables. – ANOVA compares several groups (usually categorical predictor variables) in terms of a certain dependent variable(continuous outcome ) ( if there are mixture of categorical and continuous data, ANCOVA is an alternative method.) Take a second look: They are the just different sides of the same coin! 102

Review of ANOVA Compare the means of different groups n groups, n i elements for ith group, N element in total. SST= + SS between SS within How about only two group,X and Y, Each have n data? 103

Review of Simple Linear Regression We try to find a line y = β 0 + β 1 x that best fits our data so that we can calculate the best estimate of y from x It will find such β 0 and β 1 that minimize the distance Q between the actual and estimated score Let predicted value be of one group, while the other group consist all of original value.. It is a special (and also simple) case of ANOVA! Minimize me 104

Review of Regression = + Total = Model + Error (Between)(Within) d.f.: 2-1 = 1d.f.:n-2 d.f.: n-1 105

ANOVA table of Regression 106

How are they alike? If we use the group mean to be our X values from which we predict Y we can see that ANOVA and regression is the same!! The group mean is the best prediction of a Y-score. 107

Term comparison Regression ANOVA Dependent variable Explaintory variable total mean SSR SS between SSE SS within 108

Term comparison if more than one predictor….. Regression ANOVA Multiple Regression Multi-way ANOVA dummy variable categorical variable interaction effect covariance …………………. …………… 109

Notes: Both of them are applicable only when outcome variables are continuous. They share basically the same procedure of checking the underlying assumption. 110

Robust ANOVA -Taguchi Method 111

What is Robustness? The term “robustness” is often used to refer to methods designed to be insensitive to distributional assumptions (such as normality) in general, and unusual observations (“outliers”) in particular. Why Robust ANOVA? There is always the possibility that some observations may contain excessive noise. excessive noise during experiments might lead to incorrect inferences. Widely used in Quality control 112

Robust ANOVA What we want from robust ANOVA? robust ANOVA methods could withstand non- ideal conditions while no more difficult to perform than ordinary ANOVA Standard technique----least squares method is highly sensitive to unusual observations 113

Robust ANOVA Our aim is to minimize by choosing β: In standard ANOVA, we let we can also try some other ρ(x). 114

Least absolute deviation It is well-known that the median is much more robust to outliers than the mean. least absolute deviation (LAD) estimate, which takes How is LAD related to median? the LAD estimator determines the “center” of the data set by minimizing the sum of the absolute deviations from the estimate of the center, which turns out to be the median. It has been shown to be quite effective in the presence of fat tailed data 115

M-estimation M-estimation is based on replacing ρ(.) with a function that is less sensitive to unusual observations than is the quadratic. The M means we should keep ρ follows MLE. LSD with, is an example of a robust M-estimator. Another popular choice of ρ : Tukey bisquare: and (;)1rcρ= otherwise, where r is the residual and c is a constant. 116

Suggestion these robust analyses may not take the place of standard ANOVA analyses in this context; Rather, we believe that the robust analyses should be undertaken as an adjunct to the standard analyses 117

118

119