Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Role of Statistical Analysis within a Broader Research Methodology Simon French

Similar presentations


Presentation on theme: "The Role of Statistical Analysis within a Broader Research Methodology Simon French"— Presentation transcript:

1 The Role of Statistical Analysis within a Broader Research Methodology Simon French

2 Research projects differ Some seek to explore issues –field studies vs laboratory studies Some seek to confirm or disprove hypotheses –field studies vs laboratory studies Some seek to critically evaluate an area Some seek to solve a problem and implement a solution Some design and implement new systems Some seek to develop new theory or algorithms (Social) Sciences Engineering & mathematics

3 Statistics There are lies, damn lies, and overused quotations A statistician is someone who wanted to be an accountant but did not have the charisma.Anon Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. H.G. Wells It is the function of the statistical method to emphasise that precise conclusions cannot be drawn from inadequate data. E.S. Pearson and H.O Hartley A witty statesman once said that you might prove anything by figures (but) a judicious man looks at statistics not to get knowledge but to save himself from having ignorance foisted upon him.Thomas Carlyle He uses statistics as a drunk uses a street lamp, for support rather than illumination.Andrew Lang

4 Statistics is the analytic heart of scientific research and inference It is not a numerical add-on; nor should it be seen as a hurdle to publication So off to the Welsh Valleys!

5 Cynefin: a Welsh habitat D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self- awareness." Journal of Knowledge Management 6 pp Cynefin: physical environment cultural environment social environment historical environment …..

6 Cynefin: a Welsh habitat D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self- awareness." Journal of Knowledge Management 6 pp Cynefin: physical environment cultural environment social environment historical environment …..

7 Cynefin: a Welsh habitat D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self- awareness." Journal of Knowledge Management 6 pp Cynefin: physical environment cultural environment social environment historical environment …..

8 Cynefin: a Welsh habitat D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self- awareness." Journal of Knowledge Management 6 pp Cynefin: physical environment cultural environment social environment historical environment …..

9 Cynefin: a Welsh habitat D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self- awareness." Journal of Knowledge Management 6 pp Cynefin: physical environment cultural environment social environment historical environment …..

10 Cynefin: a Welsh habitat D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self- awareness." Journal of Knowledge Management 6 pp Cynefin: physical environment cultural environment social environment historical environment …..

11 Learning and knowledge

12 Knowledge Management and Nonakas SECI 12 The practice of Science and research Sense-making and articulation is as important to Science and research

13 Cynefin and Knowledge Management Tacit Knowledge Judgement/expertise Explicit Knowledge e.g. Scientific Models

14 Applications of Cynefin Emergency Management Categorisation of DSS and OR/DA techniques Human Reliability Analysis High Reliability Organisations Knowledge Management Sensemaking Research Methodology S. French (2012) Cynefin, Statistics and Decision Analysis. Journal of the Operational Research Society. In press

15 Cynefin: learning, repeatability Repeatability and increasing familiarity

16 Cynefin and data collection Experiments and trials Case studies, interviews, and surveys

17 Cynefin and statistics Repeatable events Unique events Events? Estimation and confirmatory analysis exploratory analyses

18 Cynefin and statistics Repeatable events Unique events Events? Actually you need exploratory statistics here to check that you really are in the known or knowable space

19 19 Exploratory analyses Look at the data –In any, repeat any analysis, look at the data –It is too easy for data to pass from web questionnaire to Excel to SPSS to analysis without your looking at the data. Simple plots and tables –Tables – do not think them simple to construct! –Histograms, Boxplots, Scatterplots, … Useful in presenting results too Generally easy to produce with Excel or SPSS –If you know what you are trying to achieve –Data mining and data visualisation

20 Estimation and Confirmatory Analyses Based on statistical models –If your experiment needs statistics you should have done a better experiment ­ WRONG! Estimation –Point estimates –Confidence intervals Hypothesis tests

21 Data collection protocol You need one!!! Formal theory of experimental design –How many and which data to collect –Mix of theoretical requirements for accuracy and pragmatism But wider than that you need to plan in advance many things about how you will gather your data, be it qualitative or quantitative. It is vital that you record your planning and your reasoning. –You will not remember when you come to write your thesis/paper You also need a data storage protocol –Keep original data not summaries if you can Sufficient statistics are for the theoreticians –Keep a geographically separate copy for security purposes

22 Check assumptions Independence. Usual to assume that the data points are sampled independently so that –x 1, x 2, …, x n are independent and identically distributed (iid) Think about distributional assumptions –Parameters known? –Normal??? Maybe as approximation but check! –Do not make assumptions on the grounds that the text book gives a statistical test for those assumptions Ideally repeat analysis under different assumptions –Sensitivity analysis Outliers –Some recommend removing data that is clearly an outlier –My view: a bad scientist blames his data – so discard data at your peril –If you must remove outliers, document reasons and make sure they are good. If you cannot see the result in the data (simple plots) and/or it does not make qualitative sense, question it!

23 Value focused thinking Values are what we care about. As such, values should be the driving force for our decision making. They should be the basis for the time and effort we spend thinking about decisions. But this is not the way it is. It is not even close to the way it is. Keeney (1992) Define objectives, research questions, hypotheses at outset –(probably modify pragmatically as research progresses!) –More creative in research design Focuses attention on what matters Helps identify the right research/problem solving methodology Note: whether we talk of objectives, research questions, hypotheses depends on type of research project

24 Thank you

25 Back up Slides

26 26 Tables and Charts Clarify in titles and notes –What the data are and where they come from –Units 2 or 3 ideas can be shown/explored in a table or chart … no more –Do not make over busy xs not dustbins for data on waste! –Do not introduce spurious features E.g. number the data and accidentally introduce a ranking Watch for cognitive aspects –Appropriate scales –Appropriate number of significant figures –In tables: put important variation down the columns –Use of colour red-green bad (stop) and good (go) or just colour blind

27 Regression and Factor Analysis as exploratory analyses Often (usually!!!) data is multi-dimensional It is difficult to see the key trends and variations by eye Regression and factor analyses reduce dimensions to the significant ones

28 Regression Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points 16 (x,y) points = 32 numbers

29 Regression Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points Regression line: y = mx + c Plus standard deviation 3 numbers …Trend, base case, and spread

30 30 Factor Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points 16 (x,y) points = 32 numbers

31 Factor Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points Project each point onto regression line 16 numbers Keeps each item separate in summary

32 Regression and Factor Analysis Here we have reduced 16 points in 2 dimensions onto 1 dimension (a line) Generally reduce a lot of points in high dimension onto many fewer dimensions More general methods known as multivariate analysis –Regression analysis –ANOVA –Factor Analysis –Principal components –Multi-dimensional scaling –….

33 Ordinal and Interval Data Sometimes data only contains ranking information –Such data is called ordinal data Other times the data is measured against a scale with an origin and a unit –Such data is called interval data (or cardinal) Most of the methods of multivariate analysis assume interval data –But they work with ordinal data if you take them with a pinch of salt! (and do not believe or quote significance levels, etc.) –Read the assumptions behind the methods when using SPSS or similar.

34 Estimation Try to find a function of the data that is tightly distributed about the quantity of interest. Distribution of data data point Quantity of interest Distribution of mean Quantity of interest Data mean

35 Confidence intervals intervals defined from the data 95% confidence intervals: calculate interval for each of 100 data sets about 95 will contain.

36 Hypothesis testing Hypothesis test: general –Compare a null hypothesis H 0 and an alternative H 1 –Type 1 error: reject H 0 when H 0 true –Type 2 error: do not reject H 0 when H 1 is true Note never say accept an hypothesis! Best phrasing is there is/is not significant evidence against H 0 –Significance level is probability of type 1 error –Power is probability of type 2 error. –Conventionally significance level is set as 5% (significant) or 1% (highly significant) –Define g(x) and a critical region such that the probability that g(x) lies in the region is less than the significance level if H 0 is true

37 Hypothesis testing Note that 5% significance level means that 1 in 20 tests will result in a type 1 error and reject H 0 when it is true. Thus if you perform lots of tests in your research you will necessarily make lots of mistakes!!!!! There are theories of multiple testing to help avoid misinterpretation in such cases

38 Meta-Analysis Often there are several related studies in the literature –Datasets collected under similar conditions –Analysis of similar research questions How do we combine their results and conclusions? Key point: literature bias –Insignificant results not published –Some authors cited more often and easier to find than others Assumptions of analysis often not fully clear –Data collection procedure –Outliers? Raw data or outliers discarded?

39 Meta Analysis: key points Plan it and define a protocol before beginning –Just as you would define any other data collection procedure Define criteria for inclusion of studies a priori and use these to guide a deep and detailed literature search. Plot the different data sets on the same scales and eyeball them –Explore these data just as you would an experimental data set No right method for combining analyses so try several if possible and look for common conclusions (or explain differences in terms of different assumptions) Check sensitivity and robustness of your combined conclusion as in any other analysis


Download ppt "The Role of Statistical Analysis within a Broader Research Methodology Simon French"

Similar presentations


Ads by Google