Using Statistical techniques in Geography

Slides:



Advertisements
Similar presentations
Quantitative Skills 4: The Chi-Square Test
Advertisements

1 Multiple Regression Interpretation. 2 Correlation, Causation Think about a light switch and the light that is on the electrical circuit. If you and.
How and why to use Spearman’s Rank… If you have done scattergraphs, Spearman’s Rank offers you the opportunity to use a statistical test to get a value.
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
14 Elements of Nonparametric Statistics
What you need to know:  What is a correlation?  How do we know if the correlation between two variables is statistically significant?  How do we calculate.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1.
Data analysis – Spearman’s Rank 1.Know what Spearman’s rank is and how to use it 2.Be able to produce a Spearman’s rank correlation graph for your results.
Two-Sample Hypothesis Testing. Suppose you want to know if two populations have the same mean or, equivalently, if the difference between the population.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
+ Mortality. + Starter for 10…. In pairs write on a post it note: One statistic that we use to measure mortality On another post it note write down: A.
Advanced Higher STATISTICS Spearman’s Rank (Spearman’s rank correlation coefficient) Lesson Objectives 1. Explain why it is used. 2. List the advantages.
IB DP Geography – IA. Is Decatur a “typical” Central Business District? Higher Level: test a maximum of 3 hypotheses Standard Level: test a maximum of.
Statistics for A2 Biology Standard deviation Student’s t-test Chi squared Spearman’s rank.
Mr Barton’s Maths Notes
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Copyright © Cengage Learning. All rights reserved.
Data measurement, probability and Spearman’s Rho
Step 1: Specify a null hypothesis
Dependent-Samples t-Test
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Learning Objectives: 1. Understand the use of significance levels. 2
Correlation I have two variables, practically „equal“ (traditionally marked as X and Y) – I ask, if they are independent and if they are „correlated“,
Spearman’s Rank correlation coefficient
Data analysis Research methods.
Chapter 21 More About Tests.
Mr F’s Maths Notes Number 7. Percentages.
Graphs 2. Quadratics and Cubics
S1 Fractions Parent Class.
Testing for a difference
Elementary Statistics
Starter: complete the research methods paper
In other words the relationship between variables
Spearman’s rho Chi-square (χ2)
10: Leisure at an International Scale: Sport
Inferential Statistics
Inferential Statistics
Inferential statistics,
In other words the relationship between variables
Descriptive Analysis and Presentation of Bivariate Data
What goes in a results section?
Formation of relationships Matching Hypothesis
Mr Barton’s Maths Notes
07/12/2018 Starter L.O. To be able to Solve a quadratic by factorising
In other words the relationship between variables
Chi Square (2) Dr. Richard Jackson
1.3 Data Recording, Analysis and Presentation
Using Data to Analyze Trends: Spearman’s Rank
How and why to use Spearman’s Rank…
Mr Barton’s Maths Notes
Chi-Square Goodness of Fit Test
Statistics.
Mr Barton’s Maths Notes
Standard Deviation & Standard Error
Correlation and the Pearson r
Mr Barton’s Maths Notes
Skills 5. Skills 5 Standard deviation What is it used for? This statistical test is used for measuring the degree of dispersion. It is another way.
Scatter Graphs Spearman’s Rank correlation coefficient
Spearman’s Rank For relationship data.
Top 10 maths topics that GCSE students struggle with
Inferential testing.
Addition and Subtraction Partitioning and column addition
Starter.
Correlations and practicals
In other words the relationship between variables
Presentation transcript:

Using Statistical techniques in Geography Spearman’s Rank Correlation Coefficient (SRCC)

An example: You may want to work out the relationship between two variables e.g. In a town’s CBD, the number of shoppers ( pedestrians) and amount of convenience shops that are available.

Let’s think about a hypothesis A hypothesis is a statement which is used to test the relationship between two variables In our example our hypothesis could be: As numbers of pedestrians increase the number of convenience shops also increase At the end of our calculations we will decide whether to accept or reject this hypothesis

Creating a ‘null hypothesis’ Prior to any investigation a ‘null hypothesis’ is always set up. This just states that there is NO relationship between two variables. So in our example the null hypothesis would be: ‘ There is no relationship between number of pedestrians and number of convenience shops’ At the end of our calculations we decide whether to accept or reject this null hypothesis

Firstly, construct a scattergraph based on collected data e.g: show?W What does this graph show?

Then what? 1.Work out what the scattergraph shows 2.If there is a positive correlation then as one variable increases, so will the other variable. 3. Likewise if there is a negative correlation then as one variable increases, the other variable will decrease. 4. Sometimes the scattergraph will show no evidence of correlation at all. 5. However if either a negative or positive correlation is revealed then it is worth while doing the next step- the statistical test: Spearman Rank Correlation Coefficient………

How and why to use Spearman’s Rank… If you have done scattergraphs, Spearman’s Rank offers you the opportunity to use a statistical test to get a value which can determine the strength of the relationship between two sets of data…

So how do we do it? By using an equation: rs refers to the overall value or rank The equation has to be done before the value is taken away from 1 In the above equation, the sign means ‘the total of’ d2 is the first thing we will try to establish in our ranked tables (see next slides) ‘n’ refers to the number of sites or values you will process – so if there were there 15 river sites, ‘n’ would be 15. If there were 20 pedestrian count zones, ‘n’ would be 20, and so on… YOU DON’T HAVE TO LEARN THE FORMULA- JUST KNOW HOW TO USE IT! This is the equation, and looks complicated, so let’s think carefully about how we can do this…

An example

1. Here we have laid out a table of each of the twelve zones in a town Pedestrians Rank Convenience shops Rank (r) Difference (d) D2 1 40 8 2 3 25 5 4 60 15 12 7 6 18 19 27 9 24 10 21 11 64 70 22 1. Here we have laid out a table of each of the twelve zones in a town 2. Pedestrian counts for each zone here 3. Number of Convenience shops for each zone here 4. We now need to rank the data (two highlighted columns)– this is shown overleaf

4 12 6 11 10 9 You will see here that in this example, the Zone Pedestrians Rank Convenience shops Rank (r) Difference (d) D2 1 40 4 8 2 12 3 25 6 5 60 15 11 7 18 10 19 9 27 24 21 64 70 22 You will see here that in this example, the pedestrian counts have been ranked from highest to Lowest, with the Highest value (70) Being ranked as Number 1, the Lowest value (8) Number 12.

4 12 6 11 10 9 So that was fairly easy… Zone Pedestrians Rank Convenience shops Rank (r) Difference (d) D2 1 40 4 8 2 12 3 25 6 5 60 15 11 7 18 10 19 9 27 24 21 64 70 22 So that was fairly easy… We need to now do the next column for Convenience shops too. But hang on! Now we have a problem… We have two values that are 8, so what do we do? The next two ranks would be 4 and 5; we add the two ranks together and divide it by two. So these two ranks would both be called 4.5

Zone Pedestrians Rank Convenience shops Rank (r) Difference (d) D2 1 40 4 8 4.5 2 12 3 25 6 5 60 15 11 7 6.5 18 10 19 9 27 24 21 64 70 22 This is normally the point where one of the biggest mistakes is made. Having gone from 4.5, students will often then rank the next value as 5. But they can’t! Why not? Because we have already used rank number 5! So we would need to go to rank 6 This situation is complicated further by the fact that the next two ranks are also tied. So we do the same again – add ranks 6 and 7 and divide it by 2 to get 6.5

This is demonstrated on the next slide Rank Rank (r) 4 4.5 12 6 9 3 11 6.5 10 5 7 8 2 1 Having ranked both sets of data we now need to work out the difference (d) between the two ranks. To do this we would take the second rank away from the first. This is demonstrated on the next slide

Zone Pedestrians Rank Convenience shops Rank (r) Difference (d) 1 40 4 8 4.5 -0.5 2 12 3 25 6 5 9 -3 60 15 11 7 6.5 18 10 -1 19 27 0.5 24 21 64 70 22 The difference between the two ranks has now been established So what next? We need to square each of these d values… Don’t worry if you have any negative values here – when we square them (multiply them by themselves) they will become positives

Zone Pedestrians Rank Convenience shops Rank (r) Difference (d) 1 40 4 8 4.5 -0.5 2 12 3 25 6 5 9 -3 60 15 11 7 6.5 18 10 -1 19 27 0.5 24 21 64 70 22 D2 0.25 9 20.25 1 So, the first value squared would be 0.25 (-0.5 x -0.5)

So what do we do with these ‘d2’ figures? First we need to add all of the figures in this d2 column together This gives us…. 32 Now we can think about doing the actual equation!

6 x 32 Firstly, let’s remind ourselves of the equation... In this equation, we know the total of d2, which is 32 So the top part of our equation is… 6 x 32 We also know what ‘n’ is (the number of sites or zones - 12 in this case), so the bottom part of the equation is… (12x12x12) - 12

We can now do the equation… 192 1716 6 x 32 123 - 12 OK – so this gives us a figure of 0.111888111888 Is that us finished? Sadly not!

Back to the equation… 1 – 0. 111888111888 = 0.888 I have circled the part of the equation that we have done… Remember that we need to take this value that we have calculated away from 1. Forgetting to do this is probably the second biggest mistake that people make! So… 1 – 0. 111888111888 = 0.888

So we have our Spearman’s Rank figure….But what does it mean? -1 +1 0.888 Your value will always be between -1 and +1 in value. As a rough guide, our figure of 0.888 demonstrates there is a fairly positive relationship. It suggests that where pedestrian counts are high, there are a high number of convenience shops Should the figure be close to -1, it would suggest that there is a negative relationship, and that as one thing increases, the other decreases.

However… Just looking at a line and making an estimation isn’t particularly scientific. To be more sure, we need to look in critical values tables to see the level of significance and strength of the relationship. This is shown overleaf…

N 0.05 level 0.01 level 12 0.591 0.777 14 0.544 0.715 16 0.506 0.665 18 0.475 0.625 20 0.45 22 0.428 0.562 24 0.409 0.537 26 0.392 0.515 28 0.377 0.496 30 0.364 0.478 1. This is a critical values table and the ‘n’ column shows the numbers of sites or zones you have studied. In our case, we looked at 12 zones. 2. If look across we can see there are two further columns – one labelled 0.05, the other 0.01. The first, 0.05 means that if our figure exceeds the value, we can be sure that 95 times in 100 the figures occurred because a relationship exists, and not because of pure chance The second, 0.01, means that if our figure exceeds this value, we can be sure that 99 times in 100 the figures occurred because a relationship exists, and did not occur by chance. We can see that in our example our figure of 0.888 exceeds the value of 0.591 at the 0.05 level and also comfortably exceeds value at the 0.01 level too.

In our example above, we can see that our figure of 0 In our example above, we can see that our figure of 0.888 exceeds the values at both the 95% and 99% levels. The figure is therefore highly significant

We can therefore conclude that this figure is strongly significant and that pedestrian counts and the number of convenience shops are clearly related

Summary of SRCC 1. SRCC can be used to determine the strength of relationship between two variables . The pre- requisite to this is either a positive or negative correlation using results from a scattergraph. 2. The formula for SRCC looks complicated but is not difficult to work out. Each value needs to be determined. The best way to do this is to rank two sets of data using a table. 3. Once found , the values just need to be put back into the original equation( don’t forget to take the whole thing away from 1!!)

4. When you’ve got your SRCC value( which will always be between -1 and +1) you then need to look it up against a table of ‘critical values’ to test the strength of the relationship. 5. This critical values table is basically a load of numbers with headings at either the 0.05 level or 0.01 level. 6. To read off ,it start with N( number of values in your example) then look across at both the 0.05 and the 0.01 level. 7. If your value exceeds 0.05 then you can say that you are 95% certain that the strength of the relationship you have found didn’t occur by chance

8.If your value exceeds 0.01 then you can say that you are 99% certain that the strength of the relationship you have found didn’t occur by chance. 9.If you final value exceeds the values at both the 95% and 99% levels then you can say with confidence that the figure is highly significant and that it didn’t occur by chance. 10. Can you accept or reject your null hypothesis? 11. Can you accept or reject your original hypothesis?

Finally… You need to think how you can use this technique yourself…I would advise that you do scattergraphs for the same sets of data so that you have a direct comparison. How would you use this technique in Blencathra whilst doing a river study?

TASK Try Spearman Rank Correlation Coefficient task sheet.

Using SRCC with Blencathra questions

Data Presentation and analysis Describe one method of analysis that that was used to test the validity of your hypothesis and explain why that method was suitable for your investigation. (8 marks) Outline and justify the use of one or more techniques used to analyse your results ( 5 marks) Summarise the main findings of your enquiry (4 marks) Justify the use of one technique used to analyse data in your enquiry (4 marks)

SSRC – Spearmans rank correlation Coefficient Why did we then use Spearman’s rank ? A statistical test used to test the validity of your hypothesis To see what the strength of the relationship is – to test whether it was just a chance result. How did you do it?

1.Start with your hypotheses 1.Set up the NULL HYPOTHESIS There is no correlation between distance down stream stream order and discharge. 2.The ALTERNATIVE HYPOTHESIS As stream order and distance downstream increase, discharge also increases.

2. Rank your data 1.I placed each data set in rank order( the lowest value in each set was ranked first) 2. Next I subtracted the ranks from each order and squared the difference ( written as D) 3. I calculated the sum of D and put it into the equation to get the Spearman’s Rank Correlation Coefficient value.

Spearman’s Rank Correlation Coefficient Results The Spearmans rank coefficient was 0.783 – Value is always between -1 and +1( N.B You will need to insert your OWN value here!) At a 95% confidence level it had a critical value of 0.683 allowing us to REJECT the null hypothesis. ( SRC value needs to be greater than or equal to the critical value) At 98% confidence level it had a critical value of 0.783 EQUAL to our Spearman’s Rank Coefficient Quantitative test (i.e. reliant on numbers- Objective test)

Advantages of Spearman’s Rank technique 1. Objective( not subjective) based on numerical evidence and therefore quantitative and (hopefully) more accurate. 2.Straightforward to compute . 3. Direct comparison between two variables in original hypothesis( distance downstream and discharge) 4. Gives a numerical value of the strength of the relationship which can then be compared to critical values.