Download presentation

Presentation is loading. Please wait.

Published byClaire McDonald Modified over 3 years ago

1
Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey www.medievalarchitecture.net

2
Aims To familiarise ourselves with KEY statistical terms and their meanings To familiarise ourselves with KEY statistical terms and their meanings To understand the use of stats in archaeology To understand the use of stats in archaeology To assign variables, appropriate levels of measurement, at the recording level To assign variables, appropriate levels of measurement, at the recording level

3
Key texts

4
Basic Stats Batch Variables Case Post holes Length, area, diameter Post hole ID

5
Variables Variables are measured according to one of FOUR levels Variables are measured according to one of FOUR levels 1. Nominal = arbitrary name 2. Ordinal = sequence with no distance 3. Interval = sequence with fixed distance 4. Ratio = sequence with a fixed datum

6
Vince NOIR Vince NOIR N ominal N ominal O rdinal O rdinal I nterval I nterval R atio R atio

7
Nominal examples Condition Condition Age Age Diameter Diameter Length Length Context Context Period Period

8
Ordinal examples Condition Condition 1. Excellent 2. Good 3. Fair 4. Poor Here 2 may be between 1 and 3 but is unlikely to be of equal distance Here 2 may be between 1 and 3 but is unlikely to be of equal distance

9
Interval examples Period Period 1. Late Bronze (1200-650) 2. Early Iron (649-100) 3. Late Iron (100+) Here, if we have 3 artefacts dated 150BC, 300BC and 450BC, although b may be equal distance between a and c, c is not twice as old as a. Here, if we have 3 artefacts dated 150BC, 300BC and 450BC, although b may be equal distance between a and c, c is not twice as old as a. This is because there is no datum. This is because there is no datum.

10
Ratio examples Age instead of period Age instead of period 1000 ya is twice 500 ya1000 ya is twice 500 ya 20kg is twice 10kg20kg is twice 10kg Ratio is the highest level of measurement because it has a datum Ratio is the highest level of measurement because it has a datum

11
Mortlake style bowl Fengate style bowl Grooved ware jar Nominal, Ordinal and Interval

12
Note! Avoid using 0 or 1 to indicate such variables as yes or no, as we may need to know if it is no or no data Avoid using 0 or 1 to indicate such variables as yes or no, as we may need to know if it is no or no data Also when using presence or absence you may wish to add missing to avoid confusion Also when using presence or absence you may wish to add missing to avoid confusion

13
Further distinction Nominal and Ordinal Nominal and Ordinal = categorical = categorical = qualitative = qualitative Interval and Ratio Interval and Ratio = continuous= continuous = quantitative= quantitative

14
Coding Nominal and Ordinal often need coding, to minimise errors, via a keyword index Nominal and Ordinal often need coding, to minimise errors, via a keyword index con = context con = context str = stray findstr = stray find set = settlementset = settlement bur = burialbur = burial Avoid 1,2,3,etc, as you will have to keep looking up their meanings which is time consuming Avoid 1,2,3,etc, as you will have to keep looking up their meanings which is time consuming

15
Coding NOTE! EVERY DATA VALUE MUST HAVE A CODE AND ONLY ONE CODE!

16
Grouping Good for periods, as in Good for periods, as in Late Bronze (1200-650)Late Bronze (1200-650) Early Iron (649-100)Early Iron (649-100) Late Iron (100+)Late Iron (100+) NOTE: it is better to record as a continuous variable (i.e. 780BC), then group as an output (i.e. Late Bronze) NOTE: it is better to record as a continuous variable (i.e. 780BC), then group as an output (i.e. Late Bronze)

17
Good Practice Always keep a CLEAN version of the original data set Always keep a CLEAN version of the original data set

18
Exploring the data

19
example data set

20
univariate frequency table speciesfrequency cattle187 sheep109 pig78 horse21 Total395

21
speciespitsditchesTotal cattle67120187 sheep6346109 pig413778 horse31821 Total174221395 bivariate frequency table

22
speciespitsditchesTotal cattle 67 39% 120 54% 187 sheep 63 36% 46 21% 109 pig 41 24% 37 17% 78 horse 3 2% 18 8% 21 Total 174 100% 221 100% 395

23
Multivariate These tend to operate on a table, or matrix of items, described in terms of a set of variables These tend to operate on a table, or matrix of items, described in terms of a set of variables

24
Pictorial displays for categorical data

25
bar chart

26
multiple bar chart

27
pie chart

28
Pictorial displays for continuous data

29
histogram

32
Basic descriptive statistics: mode median mean range variance standard deviation

33
pottery fragments (weights in grams): 2, 2, 3, 5, 8

34
pottery fragments (weights in grams): 2, 2, 3, 5, 8 Mode = 2

35
Mode Mode is the only way to measure average/typical in the Nominal class Mode is the only way to measure average/typical in the Nominal class If there are two averages then they are bimodal (1,2,3,3,6,6,7,8,9) If there are two averages then they are bimodal (1,2,3,3,6,6,7,8,9) Three = trimodal, etc. Three = trimodal, etc.

36
pottery fragments (weights in grams): 2, 2, 3, 5, 8 Mode = 2 Median = 3

37
Median Best for ordinal and above Best for ordinal and above If the number of variables is even, you make a number between the two middle numbers If the number of variables is even, you make a number between the two middle numbers (1,2,3,4,5,6,7,8 = 4+5/2=4.5) (1,2,3,4,5,6,7,8 = 4+5/2=4.5)

38
pottery fragments (weights in grams): 2, 2, 3, 5, 8 Mode = 2 Median = 3 Mean = (2+2+3+5+8)/5 = 4

39
Mean The most commonly used average and, it will only work for interval and ratio The most commonly used average and, it will only work for interval and ratio It is the most important measure of position because a lot of further statistical analyses are based on it It is the most important measure of position because a lot of further statistical analyses are based on it

40
Conclusion It is important to understand that the mode, median and mean are three quite different measures of position which can give three different values when applied to the same data-set It is important to understand that the mode, median and mean are three quite different measures of position which can give three different values when applied to the same data-set 2, 2, 3, 5, 8 2, 2, 3, 5, 6, 8 Mode = 22 Median = 34 Mean = 44.333

41
The skew symmetrical Positive skewNegative skew

42
Measures of variability – the spread

43
pottery fragments (weights in grams): 2, 2, 3, 5, 8 Range = max – min 8 - 2 = 6 Very simple and of limited use

44
variance key:

45
pottery fragments (weights in grams): 2, 2, 3, 5, 8 s 2 = (2-4) 2 + (2-4) 2 + (3-4) 2 +(5-4) 2 + (8-4) 2 5 variance (s 2 ) s 2 = 5.2 s 2 = (Mean = 2=2=3=5=8/5=4)

46
variance standard deviation

47
pottery fragments (weights in grams): 2, 2, 3, 5, 8 variance (s 2 ) = = 5.2 standard deviation = = (variance) = 5.2 = 2.28

48
Summary Variables are measured according to one of FOUR levels Variables are measured according to one of FOUR levels 1. Nominal = arbitrary name 2. Ordinal = sequence with no distance 3. Interval = sequence with fixed distance 4. Ratio = sequence with a fixed datum

49
Summary Measures of position (average/typical) Measures of position (average/typical) ModeMode MedianMedian MeanMean RangeRange VarianceVariance Standard DeviationStandard Deviation

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google