Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey www.medievalarchitecture.net.

Similar presentations


Presentation on theme: "Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey www.medievalarchitecture.net."— Presentation transcript:

1 Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey www.medievalarchitecture.net

2 Aims To familiarise ourselves with KEY statistical terms and their meanings To familiarise ourselves with KEY statistical terms and their meanings To understand the use of stats in archaeology To understand the use of stats in archaeology To assign variables, appropriate levels of measurement, at the recording level To assign variables, appropriate levels of measurement, at the recording level

3 Key texts

4 Basic Stats Batch Variables Case Post holes Length, area, diameter Post hole ID

5 Variables Variables are measured according to one of FOUR levels Variables are measured according to one of FOUR levels 1. Nominal = arbitrary name 2. Ordinal = sequence with no distance 3. Interval = sequence with fixed distance 4. Ratio = sequence with a fixed datum

6 Vince NOIR Vince NOIR N ominal N ominal O rdinal O rdinal I nterval I nterval R atio R atio

7 Nominal examples Condition Condition Age Age Diameter Diameter Length Length Context Context Period Period

8 Ordinal examples Condition Condition 1. Excellent 2. Good 3. Fair 4. Poor Here 2 may be between 1 and 3 but is unlikely to be of equal distance Here 2 may be between 1 and 3 but is unlikely to be of equal distance

9 Interval examples Period Period 1. Late Bronze (1200-650) 2. Early Iron (649-100) 3. Late Iron (100+) Here, if we have 3 artefacts dated 150BC, 300BC and 450BC, although b may be equal distance between a and c, c is not twice as old as a. Here, if we have 3 artefacts dated 150BC, 300BC and 450BC, although b may be equal distance between a and c, c is not twice as old as a. This is because there is no datum. This is because there is no datum.

10 Ratio examples Age instead of period Age instead of period 1000 ya is twice 500 ya1000 ya is twice 500 ya 20kg is twice 10kg20kg is twice 10kg Ratio is the highest level of measurement because it has a datum Ratio is the highest level of measurement because it has a datum

11 Mortlake style bowl Fengate style bowl Grooved ware jar Nominal, Ordinal and Interval

12 Note! Avoid using 0 or 1 to indicate such variables as yes or no, as we may need to know if it is no or no data Avoid using 0 or 1 to indicate such variables as yes or no, as we may need to know if it is no or no data Also when using presence or absence you may wish to add missing to avoid confusion Also when using presence or absence you may wish to add missing to avoid confusion

13 Further distinction Nominal and Ordinal Nominal and Ordinal = categorical = categorical = qualitative = qualitative Interval and Ratio Interval and Ratio = continuous= continuous = quantitative= quantitative

14 Coding Nominal and Ordinal often need coding, to minimise errors, via a keyword index Nominal and Ordinal often need coding, to minimise errors, via a keyword index con = context con = context str = stray findstr = stray find set = settlementset = settlement bur = burialbur = burial Avoid 1,2,3,etc, as you will have to keep looking up their meanings which is time consuming Avoid 1,2,3,etc, as you will have to keep looking up their meanings which is time consuming

15 Coding NOTE! EVERY DATA VALUE MUST HAVE A CODE AND ONLY ONE CODE!

16 Grouping Good for periods, as in Good for periods, as in Late Bronze (1200-650)Late Bronze (1200-650) Early Iron (649-100)Early Iron (649-100) Late Iron (100+)Late Iron (100+) NOTE: it is better to record as a continuous variable (i.e. 780BC), then group as an output (i.e. Late Bronze) NOTE: it is better to record as a continuous variable (i.e. 780BC), then group as an output (i.e. Late Bronze)

17 Good Practice Always keep a CLEAN version of the original data set Always keep a CLEAN version of the original data set

18 Exploring the data

19 example data set

20 univariate frequency table speciesfrequency cattle187 sheep109 pig78 horse21 Total395

21 speciespitsditchesTotal cattle67120187 sheep6346109 pig413778 horse31821 Total174221395 bivariate frequency table

22 speciespitsditchesTotal cattle 67 39% 120 54% 187 sheep 63 36% 46 21% 109 pig 41 24% 37 17% 78 horse 3 2% 18 8% 21 Total 174 100% 221 100% 395

23 Multivariate These tend to operate on a table, or matrix of items, described in terms of a set of variables These tend to operate on a table, or matrix of items, described in terms of a set of variables

24 Pictorial displays for categorical data

25 bar chart

26 multiple bar chart

27 pie chart

28 Pictorial displays for continuous data

29 histogram

30

31

32 Basic descriptive statistics: mode median mean range variance standard deviation

33 pottery fragments (weights in grams): 2, 2, 3, 5, 8

34 pottery fragments (weights in grams): 2, 2, 3, 5, 8 Mode = 2

35 Mode Mode is the only way to measure average/typical in the Nominal class Mode is the only way to measure average/typical in the Nominal class If there are two averages then they are bimodal (1,2,3,3,6,6,7,8,9) If there are two averages then they are bimodal (1,2,3,3,6,6,7,8,9) Three = trimodal, etc. Three = trimodal, etc.

36 pottery fragments (weights in grams): 2, 2, 3, 5, 8 Mode = 2 Median = 3

37 Median Best for ordinal and above Best for ordinal and above If the number of variables is even, you make a number between the two middle numbers If the number of variables is even, you make a number between the two middle numbers (1,2,3,4,5,6,7,8 = 4+5/2=4.5) (1,2,3,4,5,6,7,8 = 4+5/2=4.5)

38 pottery fragments (weights in grams): 2, 2, 3, 5, 8 Mode = 2 Median = 3 Mean = (2+2+3+5+8)/5 = 4

39 Mean The most commonly used average and, it will only work for interval and ratio The most commonly used average and, it will only work for interval and ratio It is the most important measure of position because a lot of further statistical analyses are based on it It is the most important measure of position because a lot of further statistical analyses are based on it

40 Conclusion It is important to understand that the mode, median and mean are three quite different measures of position which can give three different values when applied to the same data-set It is important to understand that the mode, median and mean are three quite different measures of position which can give three different values when applied to the same data-set 2, 2, 3, 5, 8 2, 2, 3, 5, 6, 8 Mode = 22 Median = 34 Mean = 44.333

41 The skew symmetrical Positive skewNegative skew

42 Measures of variability – the spread

43 pottery fragments (weights in grams): 2, 2, 3, 5, 8 Range = max – min 8 - 2 = 6 Very simple and of limited use

44 variance key:

45 pottery fragments (weights in grams): 2, 2, 3, 5, 8 s 2 = (2-4) 2 + (2-4) 2 + (3-4) 2 +(5-4) 2 + (8-4) 2 5 variance (s 2 ) s 2 = 5.2 s 2 = (Mean = 2=2=3=5=8/5=4)

46 variance standard deviation

47 pottery fragments (weights in grams): 2, 2, 3, 5, 8 variance (s 2 ) = = 5.2 standard deviation = = (variance) = 5.2 = 2.28

48 Summary Variables are measured according to one of FOUR levels Variables are measured according to one of FOUR levels 1. Nominal = arbitrary name 2. Ordinal = sequence with no distance 3. Interval = sequence with fixed distance 4. Ratio = sequence with a fixed datum

49 Summary Measures of position (average/typical) Measures of position (average/typical) ModeMode MedianMedian MeanMean RangeRange VarianceVariance Standard DeviationStandard Deviation


Download ppt "Computing in Archaeology Basic Statistics Week 8 (25/04/07) © Richard Haddlesey www.medievalarchitecture.net."

Similar presentations


Ads by Google