Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Slides:



Advertisements
Similar presentations
Carles Falcon & Suz Prejawa
Advertisements

T tests, ANOVAs and regression
Andrea Banino & Punit Shah. Samples vs Populations Descriptive vs Inferential William Sealy Gosset (Student) Distributions, probabilities and P-values.
Correlation and regression
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Hypothesis Testing Steps in Hypothesis Testing:
Correlation and regression Dr. Ghada Abo-Zaid
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Correlation Correlation is the relationship between two quantitative variables. Correlation coefficient (r) measures the strength of the linear relationship.
Understanding the General Linear Model
Multiple Regression [ Cross-Sectional Data ]
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
CORRELATON & REGRESSION
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Multiple Regression Models
SIMPLE LINEAR REGRESSION
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Today Concepts underlying inferential statistics
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Multiple Regression Research Methods and Statistics.
Correlation and Regression Analysis
Lorelei Howard and Nick Wright MfD 2008
Introduction to Regression Analysis, Chapter 13,
Relationships Among Variables
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Lecture 5 Correlation and Regression
Correlation and Regression
Lecture 16 Correlation and Coefficient of Correlation
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Inference for regression - Simple linear regression
Correlation and Linear Regression
Basic Statistics (for this class) Special thanks to Jay Pinckney (The HPLC and Statistics Guru) APOS.
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Chapter 15 Correlation and Regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Causality and confounding variables Scientists aspire to measure cause and effect Correlation does not imply causality. Hume: contiguity + order (cause.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Discussion of time series and panel models
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Chapter 14 Correlation and Regression
Correlation & Regression Analysis
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
CORRELATION ANALYSIS.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.
Regression Analysis AGEC 784.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Product moment correlation
Inferential Statistics
MGS 3100 Business Analysis Regression Feb 18, 2016
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Jennifer Siegel

Statistical background Z-Test T-Test Anovas

Science tries to predict the future Genuine effect? Attempt to strengthen predictions with stats Use P-Value to indicate our level of certainty that result = genuine effect on whole population (more on this later…)

Develop an experimental hypothesis H0 = null hypothesis H1 = alternative hypothesis Statistically significant result P Value =.05

Probability that observed result is true Level =.05 or 5% 95% certain our experimental effect is genuine

Type 1 = false positive Type 2 = false negative P = 1 – Probability of Type 1 error

Let’s pretend you came up with the following theory… Having a baby increases brain volume (associated with possible structural changes)

Z - test T - test

Population

Cost Not able to include everyone Too time consuming Ethical right to privacy Realistically researchers can only do sample based studies

T = differences between sample means / standard error of sample means Degrees of freedom = sample size - 1

H0 = There is no difference in brain size before or after giving birth H1 = The brain is significantly smaller or significantly larger after giving birth (difference detected)

T=( )/( )

Women have a significantly larger brain after giving birth

One-sample (sample vs. hypothesized mean) Independent groups (2 separate groups) Repeated measures (same group, different measure)

ANalysis Of VAriance Factor = what is being compared (type of pregnancy) Levels = different elements of a factor (age of mother) F-Statistic Post hoc testing

1 Way Anova 1 factor with more than 2 levels Factorial Anova More than 1 factor Mixed Design Anovas Some factors are independent, others are related

There is a significant difference somewhere between groups NOT where the difference lies Finding exactly where the difference lies requires further statistical analysis = post hoc analysis

Z-Tests for populations T-Tests for samples ANOVAS compare more than 2 groups in more complicated scenarios

Varun V.Sethi

Objective Correlation Linear Regression Take Home Points.

Correlation - How much linear is the relationship of two variables? (descriptive) Regression - How good is a linear model to explain my data? (inferential)

Correlation Correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom).

Strength and direction of the relationship between variables Scattergrams Y X YY X Y YY Positive correlationNegative correlationNo correlation

Measures of Correlation 1)Covariance 2) Pearson Correlation Coefficient (r)

1) Covariance -The covariance is a statistic representing the degree to which 2 variables vary together {Note that S x 2 = cov(x,x) }

A statistic representing the degree to which 2 variables vary together Covariance formula cf. variance formula

2) Pearson correlation coefficient (r) -r is a kind of ‘normalised’ (dimensionless) covariance -r takes values fom -1 (perfect negative correlation) to 1 (perfect positive correlation). r=0 means no correlation (S = st dev of sample)

Limitations: Sensitive to extreme values Relationship not a prediction. Not Causality

Regression: Prediction of one variable from knowledge of one or more other variables

How good is a linear model (y=ax+b) to explain the relationship of two variables? - If there is such a relationship, we can ‘predict’ the value y for a given x. (25, 7.498)

Linear dependence between 2 variables Two variables are linearly dependent when the increase of one variable is proportional to the increase of the other one xx yy Samples: - Energy needed to boil water - Money needed to buy coffeepots

Fiting data to a straight line (o viceversa): Here, ŷ = ax + b – ŷ : predicted value of y – a: slope of regression line – b: intercept Residual error (ε i ): Difference between obtained and predicted values of y (i.e. y i - ŷ i ) Best fit line (values of b and a) is the one that minimises the sum of squared errors (SS error )  (y i - ŷ i ) 2 ε iε i ε i = residual = y i, observed = ŷ i, predicted ŷ = ax + b

Adjusting the straight line to data: Minimise  (y i - ŷ i ) 2, which is  (y i -ax i +b) 2 Minimum SS error is at the bottom of the curve where the gradient is zero – and this can found with calculus Take partial derivatives of  (y i -ax i -b) 2 respect parametres a and b and solve for 0 as simultaneous equations, giving: This can always be done

We can calculate the regression line for any data, but how well does it fit the data? Total variance = predicted variance + error variance s y 2 = s ŷ 2 + s er 2 Also, it can be shown that r 2 is the proportion of the variance in y that is explained by our regression model r 2 = s ŷ 2 / s y 2 Insert r 2 s y 2 into s y 2 = s ŷ 2 + s er 2 and rearrange to get: s er 2 = s y 2 (1 – r 2 ) From this we can see that the greater the correlation the smaller the error variance, so the better our prediction

Do we get a significantly better prediction of y from our regression equation than by just predicting the mean? F-statistic

Prediction / Forecasting Quantify strength between y and Xj ( X1, X2, X3 )

A General Linear Model is just any model that describes the data in terms of a straight line Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept. y = bx + a +ε

Multiple regression is used to determine the effect of a number of independent variables, x 1, x 2, x 3 etc., on a single dependent variable, y The different x variables are combined in a linear way and each has its own regression coefficient: y = b 0 + b 1 x 1 + b 2 x 2 +…..+ b n x n + ε The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y. i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for

Take Home Points - Correlated doesn’t mean related. e.g, any two variables increasing or decreasing over time would show a nice correlation: C0 2 air concentration in Antartica and lodging rental cost in London. Beware in longitudinal studies!!! - Relationship between two variables doesn’t mean causality (e.g leaves on the forest floor and hours of sun)

Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y Multiple Regression models the effect of several independent variables, x 1, x 2 etc, on one dependent variable, y Both are types of General Linear Model

Thank You