Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”

Similar presentations


Presentation on theme: "Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”"— Presentation transcript:

1 Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”

2 Outline Types of numerical data Frequency distributions Relationships between data Sampling strategies

3 Four types of numerical data Categorical (or nominal) –Categories: yes/no, male/female… Ordinal –Whole numbers, but showing a rank order (e.g. surveys with frequencies of scores, as in 1-10) Continuous variables (include fractions) –Interval-level measurements (e.g. o C, or dates) No zero; addition and subtraction are meaningful But ratios are not meaningful - so multiplication and division cannot be carried out –Ratio-level measurements (e.g. areas of land) Have a zero - an absolute origin Can be multiplied and divided

4 Discussion Examples of the four kinds of numerical data What sorts of data do geographers work with most?

5 Summary Source: Graphpad 2010

6 Descriptions of frequency for sample data: The sophistication of the machine! –We can put data into a programme, and get ‘statistics’ out, but do we know what they really mean? What basic frequency descriptors are there? –Central values (measures of central tendency) Mode Median Mean –Dispersal Range Quartiles Standard Deviation

7 The mode –The measurement with the maximum frequency In this instance –The mode is “3”

8 The median –The middle measurement when the measurements are placed in order of magnitude In this instance –75 measurements –The 38th measurement –Again “3”

9 The mean, or average What is the mean? –What is loosely called the average –Mean =  x/n In this example –(10 x 1) + (20 x 2) + (30 x 3) + (15 x 4) / 75 –200/75 = 2.667

10 Measures of dispersal Range –Difference between maximum and minimum (6) Quartiles –Dividing a frequency distribution into quarters (25 in each in this example) –One quarter of the area below first quartile (3) –One quarter of the area above the upper quartile (5) –Median in the middle (4) –Inter-quartile range Difference between quartiles (2)

11 Measures of dispersal Standard deviation of a population (of variance) – x-) 2 /n} –Note for a sample use n-1 (linked to degrees of freedom) In essence –Subtracting the mean (4) from each value –Summing the squares of these differences To remove effect of sign –Dividing by the total To weight it –And then taking the square root to scale it –  n = 1.581

12 Descriptors Categorical/nominal –Central tendency - mode Ordinal –Mode or median –Spread through inter-quartile range Interval –Mode, median or mean –Spread with standard deviation Ratio –Mode, median or mean –Spread with standard deviation

13 Summary Source: Graphpad 2010

14 Opportunity for discussion There are different kinds of numbers Measures we use to describe them therefore need to be different

15 Showing relationships How are different sets of data related? How can we best visualise these relationships? –Hans Rosling and Gapminder Use of graphs and maps Time series Independent (x) and dependent (y) variables

16 Clustering techniques Exploring ways to disaggregate data –Clusters within an overall distribution What might the plot on the right represent?

17 Outliers being the interesting data Again, how might we ‘explain’ the distribution on the right? Always seek to explore differences within the data

18 Appropriate representations for various data Columns/histogram Lines Pie charts Scatter graphs Doughnuts Surfaces

19 Distinguishing between cause and effect Independent and dependent variables –X generally independent –Y generally dependent Essential to distinguish between –A relationship between data; and –Actual explanation of cause and effect Importance for analysis –Independent variables explaining variation in dependent variables

20 Opportunity for discussion

21 Samples and populations The basic question –Does our sample represent the population as a whole? If so –We can be certain (within specific limits) of our results –We can claim that they are significant Sampling strategy is therefore of crucial importance –And must be discussed in detail in methodologies

22 Sampling strategies Non-probability –Convenience Whoever comes along –Purposive Case study approach: a “typical” place –Quota On basis of criteria such as age, ethnicity, gender Probability –Random Equal likelihood of anyone being selected Use of Random Number Tables –Systematic Selecting at regular intervals

23 Stratified sampling Assuming an existing pattern in the population –Can be used with both probability and non- probability –Divides the population into strata (subsets) that are each then sampled using one of the other methods Socio-economic groups, gender…

24 Our samples determine our results Need for sufficient sample size to be significant –How do we do know what sample size? –Does the sample represent the population –Relationship to analytical structure Must justify our sampling strategies in our methodologies Sampling applies equally in quantitative and qualitative research

25 Discussion


Download ppt "Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”"

Similar presentations


Ads by Google