What on earth is a p value, a Process sigma, Cronbach’s alpha, the Black- Scholes formula, a Priority in AHP, or the Sunday Times score for Portsmouth.

Slides:



Advertisements
Similar presentations
Psych 5500/6500 t Test for Two Independent Groups: Power Fall, 2008.
Advertisements

Here we add more independent variables to the regression.
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Authority 2. HW 8: AGAIN HW 8 I wanted to bring up a couple of issues from grading HW 8. Even people who got problem #1 exactly right didn’t think about.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
1 Hypothesis Testing Chapter 8 of Howell How do we know when we can generalize our research findings? External validity must be good must have statistical.
Theoretical Probability Distributions We have talked about the idea of frequency distributions as a way to see what is happening with our data. We have.
Standard Normal Table Area Under the Curve
Lean knowledge How to enable students, teachers and researchers to achieve more with less Paper to be presented at the Lean in Services and Higher Education.
Review: What influences confidence intervals?
Mr Barton’s Maths Notes
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
Statistical Tests How to tell if something (or somethings) is different from something else.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
The Basics of Regression continued
Lesson 4: Percentage of Amounts.
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Standard Error of the Mean
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Significance Tests: THE BASICS Could it happen by chance alone?
STORIES AND STATISTICS. Prepared by Frank Swain National Coordinator for Science Training for Journalists Royal Statistical Society
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Five.
GrowingKnowing.com © Estimates We are often asked to predict the future! When will you complete your team project? When will you make your first.
Analyzing Statistical Inferences How to Not Know Null.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
GrowingKnowing.com © Estimates We are often asked to predict the future! When will you complete your team project? When will you make your first.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Welcome to MM570 Psychological Statistics
14 Statistical Testing of Differences and Relationships.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
Inference About Means Chapter 23. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it’d be nice.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Problems with statistical methods in management research … and a few solutions Presentation for the INFORMS Annual Meeting, Austin, USA, November 2010.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
Financial Services Consulting, Answering Your Financial Concerns Solving Your Financial Puzzle With Financial Services Consulting Firms.
The Fine Art of Knowing How Wrong You Might Be. To Err Is Human Humans make an infinitude of mistakes. I figure safety in numbers makes it a little more.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Mr Barton’s Maths Notes
Confidence Intervals Excel
Students’ typical confusions and some teaching implications
Confidence Intervals GrowingKnowing.com © 2011
Unit 5 – Chapters 10 and 12 What happens if we don’t know the values of population parameters like and ? Can we estimate their values somehow?
Chapter 16: Sample Size “See what kind of love the Father has given to us, that we should be called children of God; and so we are. The reason why the.
INF397C Introduction to Research in Information Studies Spring, Day 12
Chapter 21 More About Tests.
Mr F’s Maths Notes Number 7. Percentages.
Chapter 8: Inference for Proportions
Inferential Statistics
Mr Barton’s Maths Notes
Presentation transcript:

What on earth is a p value, a Process sigma, Cronbach’s alpha, the Black- Scholes formula, a Priority in AHP, or the Sunday Times score for Portsmouth University? On the interpretability of measurements based on mathematical models. Michael Wood June

Management makes use of many measurements based on mathematical models, but these are often difficult to interpret sensibly. This talk will look at some examples of such measurements, and the consequences of the problems of their interpretation – including the employment of unnecessary academics to teach what should be obvious, and supporting the bad decisions which led to the recent financial crash. I will then discuss how these, and other, measurements could be redesigned to make them more useful and user-friendly.

I’ll look at four examples: 1. Six sigma and the process sigma measurement 2. Null hypothesis significance tests and p values 3. University league tables 4. Risk measurements and the normal (Gaussian) distribution

Four examples … with some imaginary dialogues between the expert and a naive user...

Process sigma – the measurement linked to the Six Sigma philosophy The process sigma for this process is The process sigma for this process is What on earth does this mean? What on earth does this mean? It means there are 430 dpmo (defects per million opportunities). Use this Sigma calculator It means there are 430 dpmo (defects per million opportunities). Use this Sigma calculatorSigma calculatorSigma calculator So why not just say 430 dpmo? Keep it simple! So why not just say 430 dpmo? Keep it simple! But this would be dumbing down. Life is difficult and we mustn’t join the modern trend of trying to make it easier. But this would be dumbing down. Life is difficult and we mustn’t join the modern trend of trying to make it easier. Why not? The complicated version adds nothing except confusing the uninitiated. (Similar comments apply to C pk.) Why not? The complicated version adds nothing except confusing the uninitiated. (Similar comments apply to C pk.)... which must be a good thing!... which must be a good thing!

p values We’ve done a survey and found that women are more intelligent than men. p value is We’ve done a survey and found that women are more intelligent than men. p value is What does the p value mean? What does the p value mean? It tells us how sure we can be about our results taking sampling error into account. It tells us how sure we can be about our results taking sampling error into account is very small. Not very impressive! is very small. Not very impressive! It’s a bit difficult to explain p values to someone like you, but smaller is better. Less than 5% mean you can be fairly sure women are cleverer than men, less than 1% is almost conclusive. It’s a bit difficult to explain p values to someone like you, but smaller is better. Less than 5% mean you can be fairly sure women are cleverer than men, less than 1% is almost conclusive. Sounds like you’re trying to confuse me … Sounds like you’re trying to confuse me … Reverse measure of wrong thing, misinterpreted Reverse measure of wrong thing, misinterpreted Statman bits. User friendly units - $/inch, etc. Statman bits. User friendly units - $/inch, etc.

… p values I’m told that if the p value is this means that we can be 99.8% confident that women really are more intelligent based on this data. Isn’t that a better way to put it? I’m told that if the p value is this means that we can be 99.8% confident that women really are more intelligent based on this data. Isn’t that a better way to put it? No, that’s a common misunderstanding... you need to go on a course, although I’m not sure you’ll take it in... No, that’s a common misunderstanding... you need to go on a course, although I’m not sure you’ll take it in... There are lots of common misunderstandings, but I’m sure about the 99.8% confident... There are lots of common misunderstandings, but I’m sure about the 99.8% confident...

University League tables The Sunday Times score for Portsmouth University is 599. The Sunday Times score for Portsmouth University is 599.Sunday Times scoreSunday Times score What does that mean? What does that mean? Well … e.g. Southampton got 783 points so Southampton is obviously a better place to study Well … e.g. Southampton got 783 points so Southampton is obviously a better place to study What are the points based on? What are the points based on? Lots of things: e.g. Student satisfaction, Research quality Lots of things: e.g. Student satisfaction, Research quality So do Southampton do better on these two?... So do Southampton do better on these two?...

... University League tables Actually Portsmouth do a little better on student satisfaction (174 vs 169/250), but Southampton do better on research quality (136 vs 112/200) Actually Portsmouth do a little better on student satisfaction (174 vs 169/250), but Southampton do better on research quality (136 vs 112/200) But student satisfaction is more important to students than research quality... But student satisfaction is more important to students than research quality... You’ve got to balance the two. The experts at the Sunday Times have done this. You’ve got to balance the two. The experts at the Sunday Times have done this. But different people may want different things... But different people may want different things...

Measurements of risk Muddled Michael has a habit of losing his car keys when he goes on holiday. He reckons he has a 25% chance of losing his keys. He decides to consult an expert on risk … Muddled Michael has a habit of losing his car keys when he goes on holiday. He reckons he has a 25% chance of losing his keys. He decides to consult an expert on risk … Easy! If he takes 9 spare keys with him, then the probability of losing all 10 keys is which is about one chance in a million … which seems an acceptable risk. Easy! If he takes 9 spare keys with him, then the probability of losing all 10 keys is which is about one chance in a million … which seems an acceptable risk. Michael puts all 10 keys on the same key ring (he doesn’t want to confuse himself by putting them in different places) and goes on holiday. Michael puts all 10 keys on the same key ring (he doesn’t want to confuse himself by putting them in different places) and goes on holiday. The problem here is that the maths assumes that losing each key is an independent event. In fact if he loses one key he will probably lose the rest as well, so a more realistic estimate of losing all his keys is 25%! The problem here is that the maths assumes that losing each key is an independent event. In fact if he loses one key he will probably lose the rest as well, so a more realistic estimate of losing all his keys is 25%! There are similar assumptions underlying most risk calculations – but if the calculations are more complicated it is easy not to notice. There are similar assumptions underlying most risk calculations – but if the calculations are more complicated it is easy not to notice.

Risk and the weather The probability of more than 1 mm of rain falling in Southampton in one day is 31.5% The probability of more than 1 mm of rain falling in Southampton in one day is 31.5% (Estimated from Met Office graph based on data.) Met Office graphMet Office graph Then, theoretically, the probability of a week when it rains every day is which suggests that this happens about every 9 years. Then, theoretically, the probability of a week when it rains every day is which suggests that this happens about every 9 years. –Two weeks with rain every day is a “once in years” event. Almost certainly happens more often – last time was November 2009, and the time before was of the same month Almost certainly happens more often – last time was November 2009, and the time before was of the same month (Southampton Weather website) Southampton Weather websiteSouthampton Weather website The theory is wrong because the assumptions are wrong! The theory is wrong because the assumptions are wrong!

Risk and the normal distribution Very similar assumptions underlie the normal (Gaussian) distribution. This assumes that the variable depends on a large number of small independent factors. If not the predictions can be misleading especially for rare events Very similar assumptions underlie the normal (Gaussian) distribution. This assumes that the variable depends on a large number of small independent factors. If not the predictions can be misleading especially for rare events Many finance measurements depend on the normal distribution and similar assumptions – e.g. Black Scholes formula. OK in normal times, but tends to seriously underestimate the probability of big falls. Many finance measurements depend on the normal distribution and similar assumptions – e.g. Black Scholes formula. OK in normal times, but tends to seriously underestimate the probability of big falls. If the Dow Jones Industrial average moved in accordance with a normal distribution, it would have moved by 4.5% or more on only six days between 1996 and 2003 …. In reality … 366 times” (Mandelbrot cited by Buckley, 2011, p. 140). If the Dow Jones Industrial average moved in accordance with a normal distribution, it would have moved by 4.5% or more on only six days between 1996 and 2003 …. In reality … 366 times” (Mandelbrot cited by Buckley, 2011, p. 140). Black Monday (1987) was a 20 sd event, once in a million year event, experienced several times by people much young than a million years (Buckley, 2011, 141). Black Monday (1987) was a 20 sd event, once in a million year event, experienced several times by people much young than a million years (Buckley, 2011, 141). Measures “understood” but not assumptions … trust in a misunderstood version … Measures “understood” but not assumptions … trust in a misunderstood version …

What can go wrong? 1. Unnecessary time and effort expended –E.g. 50% of time spent on stats courses could be saved by redesigning concepts? Big savings in time and effort possible! 2. Failure to understand a)Complete b)Subtleties 3. Misunderstanding a)Of basic concept b)Of assumptions leading to misleading uses

... for example... P values P values –Massive amount of wasted time and energy (think of all those journal articles), general confusion, misinterpretations like significant=important University league tables University league tables –scores taken too seriously, specific requirements ignored, creates uniformity because everyone thinks the same; rational world would be more varied Risk Risk –ignoring unrealistic assumptions led to over- confidence in mathematical measures which helped the financial crash...

Principles for designing measurements for understanding Remember most measurements determined by historical accident – therefore can probably be improved for current users and uses. Design not discovery. Remember most measurements determined by historical accident – therefore can probably be improved for current users and uses. Design not discovery. Name should reflect meaning of result, not the method used to get there Name should reflect meaning of result, not the method used to get there Make sure the direction is intuitive, use units and percentages as appropriate Make sure the direction is intuitive, use units and percentages as appropriate Must be an accurate description of meaning of measurement in users’ language Must be an accurate description of meaning of measurement in users’ language Users must understand key assumptions (which are not irrelevant technicalities). If possible users should follow general idea of derivation. Users must understand key assumptions (which are not irrelevant technicalities). If possible users should follow general idea of derivation.

Reasons for the persistence of strange measurements Aim often ticking a box, not understanding Aim often ticking a box, not understanding –Users don’t see problem Interests of experts and teachers Interests of experts and teachers –Mystification is good for business! Some measurements (e.g. process sigma) invented solely for this purpose? The dumbing down myth The dumbing down myth –Increased user-friendliness should lead to more, not less, powerful use of measurements –We need to dumb up so that even the dumb won’t do dumb things

References Buckley, Adrian (2011). Financial Crisis: causes, context and consequences. Harlow: Pearson Education. Buckley, Adrian (2011). Financial Crisis: causes, context and consequences. Harlow: Pearson Education. I Six Sigma (2011). Sigma calculator available at I Six Sigma (2011). Sigma calculator available at Met Office graph Met Office graph Met Office graph Met Office graph Southampton Weather website Southampton Weather website Southampton Weather website Southampton Weather website