# Computing in Archaeology Session 11. Correlation and regression analysis © Richard Haddlesey www.medievalarchitecture.net.

## Presentation on theme: "Computing in Archaeology Session 11. Correlation and regression analysis © Richard Haddlesey www.medievalarchitecture.net."— Presentation transcript:

Computing in Archaeology Session 11. Correlation and regression analysis © Richard Haddlesey www.medievalarchitecture.net

Lecture aims To introduce correlation and regression techniques To introduce correlation and regression techniques

The scattergram In correlation, we are always dealing with paired scores, and so values of the two variables taken together will be used to make a scattergram In correlation, we are always dealing with paired scores, and so values of the two variables taken together will be used to make a scattergram

example Quantities of New Forrest pottery recovered from sites at varying distances from the kilns Quantities of New Forrest pottery recovered from sites at varying distances from the kilns Site Distance (km) Quantity 1498 22060 33241 43447 52462

Negative correlation Here we can see that the quantity of pottery decreases as distance from the source increases

Positive correlation Here we see that the taller a pot, the wider the rim

Curvilinear monotonic relation Again the further from source, the less quantity of artefacts

Arched relationship (non-monotonic) Here we see the first molar increases with age and is then worn down as the animal gets older

scattergram This shows us that scattergrams are the most important means of studying relationships between two variables This shows us that scattergrams are the most important means of studying relationships between two variables

REGRESSION Regression differs from other techniques we have looked at so far in that it is concerned not just with whether or not a relationship exists, or the strength of that relationship, but with its nature Regression differs from other techniques we have looked at so far in that it is concerned not just with whether or not a relationship exists, or the strength of that relationship, but with its nature In regression analysis we use an independent variable to estimate (or predict) the values of a dependent variable In regression analysis we use an independent variable to estimate (or predict) the values of a dependent variable

Regression equation y = f(x) y = y axis (in this case the dependent y = y axis (in this case the dependent f = function (of x) f = function (of x) x = x axis x = x axis

y = f(x) y = x y = 2x y = x 2

General linear equations y = a + bx y = a + bx Where y is the dependent variable, x is the independent variable, and the coefficients a and b are constants, i.e. they are fixed for a given data Where y is the dependent variable, x is the independent variable, and the coefficients a and b are constants, i.e. they are fixed for a given data

Therefore: If x = 0 then the equation reduces to y = a, so a represents the point where the regression line crosses the y axis (the intercept) If x = 0 then the equation reduces to y = a, so a represents the point where the regression line crosses the y axis (the intercept) The b constant defines the slope of gradient of the regression line The b constant defines the slope of gradient of the regression line Thus for the pottery quantity in relation to distance from source, b represents the amount of decrease in pottery quantity from the source Thus for the pottery quantity in relation to distance from source, b represents the amount of decrease in pottery quantity from the source

y = a + bx

least-squares

y = a + bx

y = 102.64 – 1.8x

CORRELATION

1 correlation coefficient

CORRELATION 1 correlation coefficient 2 significance

CORRELATION 1 correlation coefficient r 2 significance

CORRELATION 1 correlation coefficient r -1 to +1 2 significance

nominal – in name only ordinal – forming a sequence interval – a sequence with fixed distances ratio – fixed distances with a datum point Levels of measurement:

nominal ordinal interval ratio Levels of measurement:

nominal ordinal interval Product-Moment Correlation Coefficient ratio Levels of measurement:

nominal ordinal Spearmans Rank Correlation Coefficient interval ratio Levels of measurement:

The Product-Moment Correlation Coefficient

length (cm) width (cm) sample – 20 bronze spearheads n=20

length (cm) width (cm) r = nΣxy – (Σx)(Σy) g [nΣx 2 – (Σx) 2 ] [nΣy 2 – (Σy) 2 ] n=20

r = nΣxy – (Σx)(Σy) g [nΣx 2 – (Σx) 2 ] [nΣy 2 – (Σy) 2 ] n=20

r = nΣxy – (Σx)(Σy) g [nΣx 2 – (Σx) 2 ] [nΣy 2 – (Σy) 2 ] n=20

r = nΣxy – (Σx)(Σy) g= +0.67 [nΣx 2 – (Σx) 2 ] [nΣy 2 – (Σy) 2 ] n=20

Test of product moment correlation coefficient

H 0 : true correlation coefficient = 0

Test of product moment correlation coefficient H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0

Test of product moment correlation coefficient H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables approximately random

Test of product moment correlation coefficient H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables approximately random Sample statistics needed: n and r

Test of product moment correlation coefficient H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables approximately random Sample statistics needed: n and r Test statistic: TS = r

Test of product moment correlation coefficient H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables approximately random Sample statistics needed: n and r Test statistic: TS = r Table: product moment correlation coefficient table.

n = 20

n = 20 r = 0.67 p<0.01

length (cm) width (cm)

Spearmans Rank Correlation Coefficient (r s )

H 0 : true correlation coefficient = 0

Spearmans Rank Correlation Coefficient (r s ) H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0

Spearmans Rank Correlation Coefficient (r s ) H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables at least ordinal

Spearmans Rank Correlation Coefficient (r s ) H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables at least ordinal Sample statistics needed: n and r s

Spearmans Rank Correlation Coefficient (r s ) H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables at least ordinal Sample statistics needed: n and r s Test statistic: TS = r s

Spearmans Rank Correlation Coefficient (r s ) H 0 : true correlation coefficient = 0 H 1 : true correlation coefficient 0 Assumptions: both variables at least ordinal Sample statistics needed: n and r s Test statistic: TS = r s Table: Spearmans rank correlation coefficient table

Similar presentations