Presentation is loading. Please wait.

Presentation is loading. Please wait.

2001: Dissertation Process Measurement in data-poor situations Dr. Mathias (Mat) Disney UCL Geography Office: 113 Pearson Building Tel: 7670 0592 Email:

Similar presentations


Presentation on theme: "2001: Dissertation Process Measurement in data-poor situations Dr. Mathias (Mat) Disney UCL Geography Office: 113 Pearson Building Tel: 7670 0592 Email:"— Presentation transcript:

1 2001: Dissertation Process Measurement in data-poor situations Dr. Mathias (Mat) Disney UCL Geography Office: 113 Pearson Building Tel: 7670 0592 Email: mdisney@ucl.geog.ac.uk www.geog.ucl.ac.uk/~mdisney

2 2 Overview What do we mean by data-poor? Types of measurement: asking the right question Types of sampling: looking in the right place Statistical testing, modelling and parsimony: making best use of what you have

3 3 What do we mean by data-poor? Few measurements or observations –Fewer than perhaps we would like? Few data not necessarily a problem e.g. if I want to know how tall I am –How many measurements do I need? How accurate are my measurements? How accurate do I want/need to be? How do I express uncertainty in my measurements & answer?

4 4 What do we mean by data-poor? Examples –Average height of a group (sample) of people from a larger group (population) –How many do I measure? 10? 20? –E.g. how many people in this room? –Is this sample “representative”?

5 5 What do we mean by data-poor? What if I ask a more difficult question? –E.g. Do you approve of this Government’s policy on tuition fees? –Is a yes/no/don’t know answer helpful? –How do I quantify any sources of error now? –Who do I put the question to?

6 6 We are data-poor when…. We don’t have “enough” information We have small number of samples (see random errors) and/or selection bias (see systematic errors) and/or limited time/resources e.g. –Questionnaires on hard-to-measure socio-economic indicators –Measurements of highly variable systems We have large samples BUT large variation e.g. –Temperature data over UK –Incidences of a particular type of cancer It is hard/impossible to measure variables we are interested in directly e.g. Climate change? Voting intention?

7 7 Errors and uncertainty Random errors –Examples Physical measurement of distance, time, mass, velocity, voltage Any instrument/operator has a precision NOT the same as accuracy!

8 8 Errors and uncertainty Random errors are easy (ish) to deal with –Take several/many measurements (sample “true” value) to give a mean value PLUS some estimte of uncertainty –  is standard error of mean;  is standard deviation; N is number of samples –Quote our result as mean   –So, typically reduce  by 1/  N Some links –http://level1.physics.dur.ac.uk/docs/errors.pdf –http://level1.physics.dur.ac.uk/skills/randomerrors.phphttp://level1.physics.dur.ac.uk/skills/randomerrors.php

9 9 Errors and uncertainty Systematic errors –Offset or bias in measurements (can be constant or variable) –Harder to deal with and must be identified with care –E.g. Wrongly-calibrated instrument – ruler too long, thermometer always 5 deg. too high/low –Making measurements consistently but incorrectly Particularly problematic for survey data –Is a sample “representative”? What do we mean by “representative”? Is there selection bias? –http://instructor.physics.lsa.umich.edu/ip-labs/tutorials/errors/syst.html

10 10 Errors and uncertainty Selection bias examples –A survey on drinking habits - who should I give it to? –Approach people on street? When/where? –Approach friends? Family? Colleagues? –Can deal with selection bias to a certain extent by thinking v. carefully about possible bias EXPLICITLY consider/remove selection bias in experimental design

11 11 Errors and uncertainty E.g. Randomised double blind –Only consistent way to examine impact of treatment in medicine –Single group divided into two samples by e.g. tossing coin (random assignment to group A or B) –Sample A treated in some way –Sample B given placebo –Neither researchers nor participants know which is which until study ends (both “blind”).

12 12 Errors and uncertainty: summary Spread is (probably!) random error Offset is (probably!) systematic error Figure from: http://www.mathworks.com/access/helpdesk/help/toolbox/daq/?/access/helpdesk/help/toolbox/daq/f5- 28876.html&http://www.google.co.uk/search?hl=en&q=precision+v+accuracy&meta=

13 13 Asking the right question What response are you expecting and why? Is the measurement you make the “best” one, given your hypothesis? If not, why? Can you find a better one? Have you phrased your experiment/hypothesis in such a way as to make it testable logically?

14 14 Probability and sampling How many measurements do I make? Where and when do I make them? How random is “random”? Probability is a funny thing – often seemingly counterintuitive…..

15 15 Probability is a funny thing How many people do I need in a room before P(B a,b ), the probability of two people a & b having same birthday, is better than 50:50? i.e. what is N for P N (B a,b ) > 0.5? P(B a,b ) = 365!/365 N (365-N)!

16 16 Probability is a funny thing Need to be careful about relying on intuition! NB assumes all birthdays equally likely…. How many people do I need in a room before P(B a,b ), the probability of two people a & b having same birthday, is better than 50:50? i.e. what is N for P N (B a,b ) > 0.5? P(B a,b ) = 365!/365 N (365-N)! 23

17 17 Probability is a funny thing The Monty Hall “Paradox” 3 doors, behind one is a prize (Monty knows which one) I choose a door. Monty then opens one of the other doors without a prize and asks me if I want to change my choice Should I change? Does it make any difference? ? ? ?

18 The “Inexpert Witness” Professor Sir Roy Meadows, Distinguished paediatrician Famous for “Munchausen Syndrome by Proxy” Expert witness in cases of suspected child abuse and murder Notorious for high-profile miscarriage of justice in Sally Clark trial Material from Prof. Peter Coles, Physics and Astronomy, University of Wales Cardiff

19 The Case of Sally Clark Solicitor Sally Clark was tried in 1999 for the murder of two children (Christopher, 11 weeks), (Harry, 8 weeks). Medical testimony divided Meadow’s evidence was decisive, but flawed. Appeal in autumn 2000 was dismissed Second appeal (for different reasons) in 2003, but ruling cast doubt also on Meadow’s testimony; Clark released. Sally Clark died on16 March 2007 of alcohol poisoning Material from Prof. Peter Coles, Physics and Astronomy, University of Wales Cardiff

20 The Argument The frequency of natural cot-deaths (SIDS) in affluent non- smoking families is about 1 in 8500. Meadows argued that the probability of two such deaths in one family is this squared, or about 1 in 73,000,000. This was widely interpreted as meaning that these were the odds against Clark being innocent of murder. The Royal Statistical Society in 2001 issued a press release that summed up the two major flaws in Meadow’s argument. Material from Prof. Peter Coles, Physics and Astronomy, University of Wales Cardiff

21 Independence There is strong evidence that the SIDS does have genetic or environmental factors that may correlate within a family P(second death|first)=1/77, not 1 in 8500. (1x10 5 not 1x10 7 ) Changes the odds significantly unless X and Y are independent Material from Prof. Peter Coles, Physics and Astronomy, University of Wales Cardiff

22 The Prosecutor’s Fallacy – asking the wrong question Even if the probability calculation were right, it is the wrong probability. P(Murder|Evidence) is not the same as P(Evidence|Murder), although it is easy to confuse the two. E.g. suppose a DNA sequence occurs in 1 in 100,000 people. Does this mean that if a suspect’s DNA matches that found at a crime scene, the probability he is innocent is 1:100,000? No! E.g. in a city of a 10 million people, there will be about 100 other matches In the absence of any other evidence, the DNA gives of odds of 100:1 against the suspect being guilty. Material from Prof. Peter Coles, Physics and Astronomy, University of Wales Cardiff

23 Bayes’s Theorem – the “Pie” formula P(I | E) is the updated probability we attach to I after experience E. P( I ) is the previous probability we attached to I before experience E. P( E | I) is the probability of experience E occurring IF our idea is correct. P( E |  I ) is the probability of experience E occurring even if our idea I is NOT correct (NB here and hereafter  I stands for ‘not I ‘ or ‘I not being true’. P (  I ) is the previous probability we attached to ‘not I’ ie ‘I not being true. The Pie Formula says: “The probability of some idea being correct, in the light of some evidence E, is ratio of the probability of E occurring when in fact I is correct, to the probability that it (E) will occur anyway”. Put in those words, if you think about it, it is no more than simple “Common Sense”. Material from Prof. M. Disney, Physics and Astronomy, University of Wales Cardiff

24 And so for cot deaths? The crux of the matter is P(E|I) – the probability of the accused having 2 dead babies (the evidence E) IF she is innocent (I). With 650,000 live births in the UK each year, 1:1000 resulting in cot-death, and only handful of double cot-deaths i.e. P(E|I) is small or ~1x10 -5 (1:100000) – so from police point of view looks very guilty….. But consider from the JURY’s point of view – they can be absolutely certain that the accused, guilty or innocent, has got 2 dead babies. So P(E|I) = 1. And, crucially, P(I) is not zero – there will be some innocent mothers with two cot deaths 650,000 births per year (1 in 10?) So, using Bayes’ Theorem and assuming P(I) is say 1 in 10 then….. Material from Prof. M. Disney, Physics and Astronomy, University of Wales Cardiff

25 25 So…sampling strategies Stratified random sampling –improves representativeness of sampling when homogeneous sub-groups exist i.e. population is not continuous –Divide a population into homogeneous subpopulations (strata) and sample independently. Strata should be mutually exclusive: every element in the population must be assigned to only one stratum. –E.g. voting intentions – not a continuous variable –Deliberately sample groups which might be missed in a random sample e.g. small ethnic groupings

26 26 Sampling strategies Various strategies for stratified random sampling E.g. i) Proportionate allocation –sampling fraction in each strata proportional to total population e.g. for 60% in the male stratum and 40% in the female stratum, then the relative size of the two samples (three males, two females) should reflect this proportion E.g. ii) Optimum/disproportionate allocation –more samples taken in strata with the greatest variability –E.g. if variance of women’s height twice that of men, sample twice as many women as men

27 27 Sampling strategies Useful for all kinds of spatial, temporal measurements –Stratify according to population density for e.g. to overcome density disparity –E.g. random samples of population in UK will lead to large bias towards SE & few/no samples in N/NE –Stratify according to population e.g. deliberately select areas in NE to avoid bias cause by population of SE

28 28 Summary Consider sources of error (random, systemtatic) Consider best experimental design to minimise error: sampling strategy, sample size etc. Include some uncertainty analysis –at very least, quote results of sampling with some estimate of standard error Bayesian methods are a very useful (the only real?) way to assess uncertainty…

29 29 Reading Various texts Hardisty, J et al., Computerised Environmetal Modelling: A Practical Introduction Using Excel (Principles and Techniques in the Environmental Sciences), 1993, Wiley Blackwell. Wainwright, J. and Mulligan, M. (eds) Environmental Modelling: Finding Simplicity in Complexity, 2004, John Wiley and Sons. Casti, John L., 1997 Would-be Worlds (New York: Wiley and Sons). Advanced texts Gershenfeld, N., 2002, The Nature of Mathematical Modelling,, CUP. Boeker, E. and van Grondelle, R., Environmental Science, Physical Principles and Applications, Wiley. Gauch, H., 2002, Scientific Method in Practice, CUP.

30 30 Monty Hall redux You should always change. But why – surely the odds are 50:50? Think about possible range of outcomes: –Pick right door to start: 1 in 3 chance Both remaining doors blank so changing after Monty opens a blank door means we always lose –Pick wrong door to start: 2 in 3 chance Remaining doors are 1 blank and 1 with prize so Monty must open only blank door left – changing now means we always win –So if we always change we win 2/3 of the time, if we don’t we only win 1/3 of the time

31 31 Monty Hall redux – Bayesian analysis For red, green and blue doors, initially Assume we pick red door A r Define Monty picking blue door as B If the prize is behind red door (A r ) then Monty can pick either of the green or blue doors equally so…..P(B|A r ) = ½ If prize behind green door then he must pick blue i.e. P(B|A g ) = 1 If prize behind blue door then he must pick green i.e. P(B|A g ) = 0 So……


Download ppt "2001: Dissertation Process Measurement in data-poor situations Dr. Mathias (Mat) Disney UCL Geography Office: 113 Pearson Building Tel: 7670 0592 Email:"

Similar presentations


Ads by Google