Analysis of variance and statistical inference.

Slides:



Advertisements
Similar presentations
Spiders on Mazurian lake islands: Wigry –Mikołajki, Nidzkie, Bełdany) Analysis of variance Photo: Wigierski Park Narodowe Photo: Ruciane.net Araneus diadematus.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Kin 304 Regression Linear Regression Least Sum of Squares
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
Advanced analytical approaches in ecological data analysis The world comes in fragments.
Hypothesis Testing Steps in Hypothesis Testing:
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
Bivariate Regression Analysis
Variance and covariance M contains the mean Sums of squares General additive models.
Spiders on Mazurian lake islands: Wigry –Mikołajki, Nidzkie, Bełdany) Lecture 2 Analysis of variance Photo: Wigierski Park Narodowe Photo: Ruciane.net.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Correlation and linear regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Variance and covariance Sums of squares General linear models.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Correlation & Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Analysis of Covariance David Markham
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
1 Multivariate Linear Regression Models Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Environmental Modeling Basic Testing Methods - Statistics III.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.
Information criteria What function fits best? The more free parameters a model has the higher will be R 2. The more parsimonious a model is the lesser.
Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Probability Theory and Parameter Estimation I
B&A ; and REGRESSION - ANCOVA B&A ; and
Kin 304 Regression Linear Regression Least Sum of Squares
Statistics in MSmcDESPOT
BPK 304W Correlation.
Hypothesis testing and Estimation
Simple Linear Regression
Adequacy of Linear Regression Models
Multivariate Linear Regression
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Presentation transcript:

Analysis of variance and statistical inference

Repetitive designs In medical research we test patients before and after medical treatment to infer the influence of the therapy. We have to divide the total variance (SStotal) in a part that contains the variance between patients (SSbetween) and within the patient (SSwithin). The latter can be divided in a part that comes from the treatment (SStreat) and the error (SSerror) Medi cal treat ment

Ipsative data

Spiders from two Mazuarian lake ensembles Summary statistics

Starting hyotheses The degree of disturbance (human impact) influences species richenss. Species richness and abundance depends on island area and environmental afctors. Island ensembles differ in species richness and abundance. Area, abundance, and species richness are non-linearly related. Latitude and longitude do not influence species richness. Sorting Area, abundance, and species richness are non- linearly related. Latitude and longitude do not influence species richness. Species richness and abundance depends on island area and environmental factors. Island ensembles differ in species richness and abundance. The degree of disturbance (human impact) influences species richenss. The hypotheses are not independent. Each hypothesis influences the way how to treat the next.

Area, abundance, and species richness are non-linearly related. Species – area and individuals area relationships

Latitude and longitude do not influence species richness. Is species richness correlated with longitude and latitude? Does the distance between islands influence species richness? Are geographically near islands also similar in species richness irrespective of island area? R(S-Long) = 0.22 n.s. R(S-Lat) = 0.28 n.s.) That there is no significant correlation does not mean that latitude and longitude do not have an influence on the regression model with environmental variables. Spatial autocorrelation S1 S3 S5S6 S2 S4 In spatial autocorrelation the distance between study sites influence the response (dependent) variable. Spatialy adjacent sites are then expected to be more similar with respect to the response variable.

Moran’s I as a measure of spatial autocorrelation Moran’s I is similar to a correlation coefficient all applied to pairwise cells of a spatial matrix. It differs by weighting the covariance to account for spatial non-independence of cells with respect to distance. If cell values were randomly distributed (not spatially autocorrelated) the expected I is Statistical significance is calculated from a Monte Carlo simulation S1 S3 S5S6 S2 S4 All combinations of sites

Individuals/trap is slightly spatially autocorrelated Latitude and longitude slightly influence species richenss. Even this weak effect might influence the outcome of a regression analysis.

Errors: Too many variables!! Solution: prior factor analysis to reduce the number of dependent variables Stepwsie variable reduction Akaike information criterion. The lower AIC the more appropriate is the model OLS result Spatial autoregression result Log transformed variables

Information criteria What function fits best? The more free parameters a model has the higher will be R 2. The more parsimonious a model is the lesser is the bias towards type I errors. We have to find a compromis between goodness of fit and bias! Model parameters few many Bias Explained variance The optimal number of model parameters

The Akaike criterion of model choice k: number of model parameters L: maximum likelihood estimate of the model If the parameter errors are normal and independent we get n: number data points RSS: residual sums of squares If we fit using  2 : If we fit using R 2 : At small sample size we should use the following correction The preferred model is the one with the lowest AIC.

We get the surprising result that the seemingly worst fitting model appears to be the preferred one. A single outlier makes the difference. The single high residual makes the exponential fitting worse

Significant difference in model fit Approximately  AIC is statisticaly significant in favor of the model with thesmaller AIC at the 5% error benchmark if |  AIC| > 2. The last model is significantly (5% level) the best.

Stepwise variable elimination Standardized coefficients (  -values) are equivalents of correlation coefficients. They should have values above 1. Such values point to too high correlation between the predictor variables (collinearity). Collnearity disturbs any regression model and has to be eliminated prior to analysis. Highly correlated variables essentially contain the same information. Correlations of less than 0.7 can be tolerated. Hence check first the matrix of correlation coefficients. Eliminate variables that do not add information.

The final model Simple test wise probability levels. We yet have to correct for multiple testing. Bonferroni correction To get an experiment wise error rate of 0.05 our test wise error rates have be less than 0.05/n The best model is not always the one with the lowest AIC or the highest R 2. Species richness is positively correlated with island area and negatively with soil humidity.

Island ensembles differ in species richness and abundance. Analysis of covariance (ANCOVA) Species richness depends on environmental factors that may differ between island ensembles. A simple ANOVA does not detect any difference

Analysis of covariance (ANCOVA) ANCOVA is the combination of multiple regression and analysis of variance. First we perform a regression anlyis and use the residuals of the full model as entries in the ANOVA. ANCOVA is the ANOVA on regression residuals. We use the regression residuals for further analysis The metrically scaled variables serve as covariates. Sites with very high positive residuals are particularly species rich even after controlling for environmental factors. These are ecological hot spots. Regression analysis serves to identify such hot spots

ANCOVA Species richness does not differ between island ensembles.

The degree of disturbance (human impact) influences species richenss. Species richness of spiders on lake islands appears to be independent of the degree of disturbance

How does abundance depend on environmental fatcors? The ful model and stepwise variable elimination All coefficients are highly significant! All standardized coefficients are above 1. This points to too high collinearity We furthr eliminate uninformative variables. Abundance does not significally depend on environmental variables

How does abundance depend on the degree of disturbance? Abundance of spiders on lake islands appears to be independent of the degree of disturbance