Variance and covariance M contains the mean Sums of squares General additive models.

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

Managerial Economics in a Global Economy
Analysis of variance and statistical inference.
Chapter 12 Simple Linear Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
More on regression Species richness pH Humidity Light Temper ature Organic matter content Species richness pH Humidity Light Temper ature Organic matter.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 10 Curve Fitting and Regression Analysis
Ch11 Curve Fitting Dr. Deshi Ye
The Simple Linear Regression Model: Specification and Estimation
Chapter 13 Additional Topics in Regression Analysis
Chapter 10 Simple Regression.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 4 Multiple Regression.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Correlation and linear regression
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Introduction to Probability and Statistics Linear Regression and Correlation.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Simple Linear Regression Analysis
Variance and covariance Sums of squares General linear models.
Correlation & Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Correlation and Regression
3.1 Ch. 3 Simple Linear Regression 1.To estimate relationships among economic variables, such as y = f(x) or c = f(i) 2.To test hypotheses about these.
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Path Analysis and Structured Linear Equations Biologists in interested in complex phenomena Entails hypothesis testing –Deriving causal linkages between.
Chapter 13 Multiple Regression
Discussion of time series and panel models
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.
Multiple Regression David A. Kenny January 12, 2014.
The simple linear regression model and parameter estimation
Regression Analysis AGEC 784.
Regression.
Linear regression Fitting a straight line to observations.
Product moment correlation
Multiple Regression Berlin Chen
Presentation transcript:

Variance and covariance M contains the mean Sums of squares General additive models

The coefficient of correlation For a matrix X that contains several variables holds The matrix R is a symmetric distance matrix that contains all correlations between the variables The diagonal matrix  X contains the standard deviations as entries. X-M is called the central matrix. We deal with samples

Pre-and postmultiplication Premultiplication Postmultiplication For diagonal matrices X holds

Linear regression European bat species and environmental correlates

N=62 Matrix approach to linear regression X is not a square matrix, hence X -1 doesn’t exist.

The species – area relationship of European bats What about the part of variance explained by our model? 1.16: Average number of species per unit area (species density) 0.24: spatial species turnover

How to interpret the coefficient of determination Statistical testing is done by an F or a t-test. Total variance Rest (unexplained) variance Residual (explained) variance

The general linear model A model that assumes that a dependent variable Y can be expressed by a linear combination of predictor variables X is called a linear model. The vector E contains the error terms of each regression. Aim is to minimize E.

The general linear model If the errors of the preictor variables are Gaussian the error term e should also be Gaussian and means and variances are additive Total variance Explained variance Unexplained (rest) variance

1.Model formulation 2.Estimation of model parameters 3.Estimation of statistical significance Multiple regression

Multiple R and R 2

The coefficient of determination y x1x1 x2x2 xmxm The correlation matrix can be devided into four compartments.

Adjusted R 2 R: correlation matrix n: number of cases k: number of independent variables in the model D<0 is statistically not significant and should be eliminated from the model.

A mixed model

The final model Is this model realistic? Negative species density Realistic increase of species richness with area Increase of species richness with winter length Increase of species richness at higher latitudes A peak of species richness at intermediate latitudes The model makes a series of unrealistic predictions. Our initial assumptions are wrong despite of the high degree of variance explanation Our problem arises in part from the intercorrelation between the predictor variables (multicollinearity). We solve the problem by a step- wise approach eliminating the variables that are either not significant or give unreasonable parameter values The variance explanation of this final model is higher than that of the previous one.

Multiple regression solves systems of intrinsically linear algebraic equations The matrix X’X must not be singular. It est, the variables have to be independent. Otherwise we speak of multicollinearity. Collinearity of r<0.7 are in most cases tolerable. Multiple regression to be safely applied needs at least 10 times the number of cases than variables in the model. Statistical inference assumes that errors have a normal distribution around the mean. The model assumes linear (or algebraic) dependencies. Check first for non-linearities. Check the distribution of residuals Y exp -Y obs. This distribution should be random. Check the parameters whether they have realistic values. Multiple regression is a hypothesis testing and not a hypothesis generating technique!! Polynomial regression General additive model

Standardized coefficients of correlation Z-tranformed distributions have a mean of 0 an a standard deviation of 1. In the case of bivariate regression Y = aX+b, R xx = 1. Hence B=R XY. Hence the use of Z-transformed values results in standardized correlations coefficients, termed  -values

How to interpret beta-values If then Beta values are generalisations of simple coefficients of correlation. However, there is an important difference. The higher the correlation between two or more predicator variables (multicollinearity) is, the less will r depend on the correlation between X and Y. Hence other variables might have more and more influence on r and b. For high levels of multicollinearity it might therefore become more and more difficult to interpret beta-values in terms of correlations. Because beta-values are standardized b-values they should allow comparisons to be make about the relative influence of predicator variables. High levels of multicollinearity might let to misinterpretations. Beta values above one are always a sign of too high multicollinearity Hence high levels of multicollinearity might  reduce the exactness of beta-weight estimates  change the probabilities of making type I and type II errors  make it more difficult to interpret beta-values. We might apply an additional parameter, the so-called coefficient of structure. The coefficient of structure c i is defined as where r iY denotes the simple correlation between predicator variable i and the dependent variable Y and R 2 the coefficient of determination of the multiple regression. Coefficients of structure measure therefore the fraction of total variability a given predictor variable explains. Again, the interpretation of c i is not always unequivocal at high levels of multicollinearity.

Partial correlations Semipartial correlation A semipartial correlation correlates a variable with one residual only. The partial correlation r xy/z is the correlation of the residuals  X and  Y

Path analysis and linear structure models Multiple regression Path analysis tries to do something that is logically impossible, to derive causal relationships from sets of observations. Path analysis defines a whole model and tries to separate correlations into direct and indirect effects The error term e contain the part of the variance in Y that is not explained by the model. These errors are called residuals Regression analysis does not study the relationships between the predictor variables

Path analysis is largely based on the computation of partial coefficients of correlation. Path coefficients Path analysis is a model confirmatory tool. It should not be used to generate models or even to seek for models that fit the data set. We start from regression functions

From Z-transformed values we get eZ Y = 0 Z Y Z Y = 1 Z X Z Y = r XY Path analysis is a nice tool to generate hypotheses. It fails at low coefficients of correlation and circular model structures.

Non-metric multiple regression

Statistical inference Rounding errors due to different precisions cause the residual variance to be larger than the total variance.

Logistic and other regression techniques We use odds The logistic regression model

Generalized non-linear regression models A special regression model that is used in pharmacology b 0 is the maximum response at dose saturation. b 1 is the concentration that produces a half maximum response. b 2 determines the slope of the function, that means it is a measure how fast the response increases with increasing drug dose.