# Computing in Archaeology Session 12. Multivariate statistics © Richard Haddlesey www.medievalarchitecture.net.

## Presentation on theme: "Computing in Archaeology Session 12. Multivariate statistics © Richard Haddlesey www.medievalarchitecture.net."— Presentation transcript:

Aims To introduce the techniques of multivariate analysis To introduce the techniques of multivariate analysis Cluster analysisCluster analysis Correspondence analysisCorrespondence analysis Principal components and factor analysisPrincipal components and factor analysis Multiple regressionMultiple regression Discriminant analysisDiscriminant analysis Key text Key text Fletcher & Lock 2005 Digging Numbers Fletcher & Lock 2005 Digging Numbers

Introduction to multivariate analysis In earlier lectures we have seen examples of univariate analysis using such techniques as simple bar charts, frequency tables of one variable and calculations of a simple sample mean In earlier lectures we have seen examples of univariate analysis using such techniques as simple bar charts, frequency tables of one variable and calculations of a simple sample mean When 2 variables are involved such as in clustered bar charts, scatterplots, when we comparing the mean of 2 groups or when we are asking is the any association between 2 variables, then we are using such techniques of bivariate analysis When 2 variables are involved such as in clustered bar charts, scatterplots, when we comparing the mean of 2 groups or when we are asking is the any association between 2 variables, then we are using such techniques of bivariate analysis

Introduction to multivariate analysis More than two variables, however, we are dealing with multivariate analysis More than two variables, however, we are dealing with multivariate analysis

SPSS These techniques require the use of suitable statistical packages, such as SPSS, because of the considerable computation involved These techniques require the use of suitable statistical packages, such as SPSS, because of the considerable computation involved Consequently, the approach of working examples by hand used in earlier lectures is not relevant here and we will not be going into the statistical and mathematical details behind the techniques Consequently, the approach of working examples by hand used in earlier lectures is not relevant here and we will not be going into the statistical and mathematical details behind the techniques

Techniques discussed Type A: reduction and grouping Type A: reduction and grouping Given several measurements (ordinal interval or presence/absence) on each of many objects (i.e. several variables and many cases) is it possible to reduce the number of variables, still maintaining the information in the data?Given several measurements (ordinal interval or presence/absence) on each of many objects (i.e. several variables and many cases) is it possible to reduce the number of variables, still maintaining the information in the data? Using either the original variables or the new reduced set can these objects be put into groups or clusters so that within each group the objects are similar but between groups there are interpretable differencesUsing either the original variables or the new reduced set can these objects be put into groups or clusters so that within each group the objects are similar but between groups there are interpretable differences

Techniques discussed Type B: prediction Type B: prediction Given several measurements (ordinal interval or presence/absence) on each of many objects (i.e. several variables many cases) with one of the variables of particular interest, is it possible to predict this variable from the others and if so which variables are important in this prediction?Given several measurements (ordinal interval or presence/absence) on each of many objects (i.e. several variables many cases) with one of the variables of particular interest, is it possible to predict this variable from the others and if so which variables are important in this prediction?

Type A techniques Cluster analysis Cluster analysis Correspondence Analysis Correspondence Analysis Principal Components and Factor Analysis (PCA) Principal Components and Factor Analysis (PCA)

Type B techniques Multiple regression Multiple regression Discriminant analysis Discriminant analysis

Type A: 1. reduction and grouping 2. cluster analysis We may wish to ask We may wish to ask Can spearheads be grouped or clustered, so that those within a cluster are similar to each other but there are important differences between the clusters?Can spearheads be grouped or clustered, so that those within a cluster are similar to each other but there are important differences between the clusters? i.e. if we group by dimension, thus creating clusters of like sized spearheads, will it show a difference between various size clusters?i.e. if we group by dimension, thus creating clusters of like sized spearheads, will it show a difference between various size clusters?

Hierarchical cluster analysis Most stats packages offer a standard clustering method called hierarchical cluster analysis Most stats packages offer a standard clustering method called hierarchical cluster analysis It starts by making each spearhead a single cluster. We then tell it how we want the clusters produced and SPSS will reduce the single clusters into one big cluster It starts by making each spearhead a single cluster. We then tell it how we want the clusters produced and SPSS will reduce the single clusters into one big cluster It will then output the data and provide information on cluster membership and indicate how good the clustering has been (i.e. how similar the members are) It will then output the data and provide information on cluster membership and indicate how good the clustering has been (i.e. how similar the members are)

Dendrograms The way to visualise the clusters as they are formed, as an aid to deciding how many are significant, is by asking the software to produce a dendrogram The way to visualise the clusters as they are formed, as an aid to deciding how many are significant, is by asking the software to produce a dendrogram

Type B: 1 prediction 2 multiple regression We have already covered the theory of prediction and regression in the previous lecture. Although we are now talking about multiple regression, the principle is the same and is best understood through the practical session to follow We have already covered the theory of prediction and regression in the previous lecture. Although we are now talking about multiple regression, the principle is the same and is best understood through the practical session to follow

Type B: 1 prediction 2 multiple regression We may ask We may ask Can the length of a spear be predicted if the tip is missing?Can the length of a spear be predicted if the tip is missing? Previously we discussed correlation and regression between two variables, multiple regression allows to use multiple variables Previously we discussed correlation and regression between two variables, multiple regression allows to use multiple variables

Multiple regression Multiple regression will produce a linear equation relating spear length, the dependant variable, to several independent variables such as socket length, maximum width, width of upper socket and width of lower socket. Multiple regression will produce a linear equation relating spear length, the dependant variable, to several independent variables such as socket length, maximum width, width of upper socket and width of lower socket. Both the dependant variable (the one to be predicted) and the individual variables (the ingredients for this prediction) must be measured on an interval scale or be presence/absence data Both the dependant variable (the one to be predicted) and the individual variables (the ingredients for this prediction) must be measured on an interval scale or be presence/absence data