Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics Savannah River Chapter of the Health Physics Society Aiken,

Similar presentations

Presentation on theme: "1 Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics Savannah River Chapter of the Health Physics Society Aiken,"— Presentation transcript:

1 1 Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics Savannah River Chapter of the Health Physics Society Aiken, SC April 15, 2011 Tom LaBone

2 2 “It is easy to lie with statistics.” “It is hard to tell the truth without statistics." Andrejs Dunkels “There are three kinds of lies: lies, damned lies, and statistics.” Mark Twain

3 3 Today Informal, mostly apocryphal discussion of  what statistics really is,  who practices statistics and how they do it, and  why all of this is important to you as a health physicist Main message of talk  A good working knowledge of statistics is essential in any endeavor where data are collected and analyzed (e.g., health physics)  Everyone in the room should become a statistician (of sorts) No math is used in this presentation and no health physicists were harmed during its preparation

4 4 Health Physics and Statistics Some HP “stat” books I used in school  G. F. Knoll Radiation Detection and Measurement 1 st Edition 1979  J. Shapiro Radiation Protection 1 nd Edition 1972  H. Cember Introduction to Health Physics 1 st Edition 1969  R. D. Evans The Atomic Nucleus 1955  P. R. Bevington Data Reduction and Error Analysis for the Physical Sciences 1 st Edition 1969 Statistics was a tool, a “wrench to turn a nut”  Is that all it is?

5 5 “Humans are good, she knew, at discerning subtle patterns that are really there, but equally so at imagining them when they are altogether absent.” Carl Sagan in Contact What is Statistics?

6 6 Signals and Noise Useful information comes to us in the form of signals that form distinct patterns The signals are contaminated with varying degrees of noise, which can make it difficult to see the signal

7 7 Seeing Patterns In our evolutionary history, seeing patterns where none existed may have been less harmful than missing patterns that did exist  That noise in the grass – is it just the wind or is it a lion? So, we as a species got very good at seeing patterns, even in the absence of a signal

8 8 Apophenia Apophenia is the experience of seeing meaningful patterns or connections in random or meaningless data What do you see below?

9 9 Viking 1 Orbiter Mars Global Surveyor Face on Mars

10 10 Face in Food, et cetera

11 11 Face in Data

12 12 Statistics is … … a science that helps us to differentiate signal from noise and make decisions with a known probability of being wrong … a very practical, decision oriented methodology developed to tame our natural tendency to be Apopheniacs … based on the idea that variability and noise are natural and unavoidable … a relatively modern science that is actively evolving  especially since cheap, powerful computers became available

13 13 Really, What is Statistics? Chris Chatfield Problem Solving: A Statistician’s Guide “Statistics is concerned with collecting, analyzing, and interpreting data in the best possible way, where the meaning of “best” depends on the particular circumstances of the practical situation”

14 14 Exploratory Data Analysis Look at data (usually with graphics) and use our ability to see patterns in the data to  Suggest hypotheses to test  Assess validity of assumptions on which statistical inference will be based  Support the selection of appropriate inferential tests  Suggest ideas for further data collection

15 15 Fecal SamplesAir Filters

16 16 Confirmatory Data Analysis Use statistical tests to answer questions about the data along with the risks of reaching the wrong conclusion  Is the material on the filters the same material that is in the fecal samples?  Are the Pu-239 to Am-241 ratios in the fecal samples and air samples the same once we account for random noise?

17 17 95% CI = (1.33, 1.46) 2 Fecal Samples

18 18 Data Dredging Are the two Pu-239 to Am-241 ratios the same? If this question was asked before we saw the data we can proceed with the test to answer it If this question was inspired by the data then we should not test the same data to get the answer  Referred to as data snooping, data dredging, etc.  Cancer clusters

19 19 Statistical Method Define the problem  Formulate your questions in such a way that unambiguous answers are possible Collect data  Collect data capable of answering your question Analyze the data Present the results  in terms your audience can understand

20 20 "It is better to solve the right problem the wrong way than to solve the wrong problem the right way". Richard Hamming “An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.” John Tukey Define the Problem

21 21 Data Collection Collect data that are capable of answering the question asked (Data Quality Objectives)  Designed experiments  Observational studies Sampling  You select samples from a population in order to make inferences about the population

22 22 GIGO The collection of data is often the most time- consuming and expensive part of a study Reverend Bayes and all of his horses can’t fix a bum dataset

23 23 Analyze the Data All statistical procedures have assumptions In practice, the assumptions of any given statistical procedure are violated to some degree  Can the validity of the assumptions be verified?  Can the validity of the answer be verified? How robust is your statistical procedure to violations of its assumptions? Simple approximate solutions you can understand may be better than complex exact solutions that you can’t Augment standard statistical analyses with simulations

24 24 Present Results Technical answer versus the functional answer  “the null hypothesis is not rejected”  technically “not rejected”  “accepted”  functionally “not rejected” =  “accepted” Statistical significance and practical significance  Apply “so what” test to your answers

25 25 What is a Statistician? “Powerful spirits should only be called by the master himself” Goethe The Sorcerer's Apprentice

26 26 What is a Statistician? Based on Chatfield’s definition of statistics, anyone who makes decisions based on the analysis of data might be called a statistician However, the title statistician is usually reserved for a professional who has specialized training in the concepts, theoretical bases, and methodologies of statistics Key difference between the sorcerer and his apprentice  Contrary to what you might think, there is a lot of subjectivity and professional judgment in the practice of statistics  Statistics is vast in scope and detail, and the apprentice does not know what he does not know “It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so.” Mark Twain

27 27 The Sorcerer’s Apprentice We may not be statisticians, but we are clearly doing statistics, often without adult supervision Doing our own statistics is a good thing, but we need to become better students of the black arts and consult the master before the brooms get out of control “Should I refuse a good dinner simply because I do not understand the processes of digestion?” Oliver Heaviside [On being criticized for using formal mathematical manipulations without understanding how they worked]

28 28 How We Can be Better Statisticians Master the basics Learn the language Play with your data Use better software Perform reproducible work Consult with a real statistician

29 29 Master the Basics Kahn Academy

30 30 Statistics MS/Certificate Distance Programs University of South Carolina Colorado State University Texas A&M University Penn State University

31 31 Concepts and Terminology Specialized Concepts  Population versus sample for example Statistics has a very precise language all its own  “the null hypothesis is not rejected”  “not rejected”  “accepted” Questions and answers are not right unless you use the proper language to convey the proper concept  some statisticians can be intolerant of laymen who misuse the language of statistics Learn to phrase questions and interpret answers properly

32 32 Exploratory Statistics Learn to play with your data and see if it is trying to tell you something new Study graphs of your data “There is no data that can be displayed in a pie chart, that cannot be displayed BETTER in some other type of chart.” John Tukey

33 33 Software used for Statistics I use the following software for statistical calculations (in order of usage)  R  Minitab  SAS  Spreadsheet (e.g., MS Excel, Gnumeric) There are many others

34 34 Spreadsheets (Excel) What some people can do in Excel is nothing short of amazing (but should they be doing it?)  Amarillo Slim beat tennis champ Bobby Riggs at Ping- Pong, using a frying pan instead of a paddle Spreadsheet Addiction by Patrick Burns  diction.html diction.html Problems with spreadsheet implementation  Excel has a long history of doing bad stats Problems with spreadsheet paradigm  Reproducible science

35 35 9/28/2007 M. G. Almiron et al. On the Numerical Accuracy of Spreadsheets, Journal of Statistical Software (34) 4, 2010

36 36 Reproducible Research Reproducible research refers to the idea that the ultimate product of research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. necessary for reproduction of the results Raw Data Data Massaging Calculations Plots and Tables Final Paper

37 37 The R Project for Statistical Computing R is a language and environment for statistical computing and graphics R is available as Free Software under the terms of the GNU General Public License in source code form It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS Download from

38 38 Advantages of R Command line interface rather than a GUI  Promotes reproducible statistics Open source  Flexible licensing  Availability of source code for peer review  Bugs are public knowledge and are fixed quickly  New tests and methods tend to appear first in R Many dozens of recently published books devoted to R Free (and very good) community support available

39 39 Consult with a Statistician If you are going to involve a statistician, do it at the study design and data collection phases  If not, at least estimate how much it will cost to collect the data all over again Anybody can analyze compelling data “To call in the statistician after the experiment is done may be no more than asking him to perform a post- mortem examination: he may be able to say what the experiment died of.” Sir Ronald Fisher

40 40 Twisted Answers to Crooked Questions As health physicists there are times when a decision will be made, with or without good data and a proper statistical analysis In such situations we base our decisions on professional judgment, often augmented with “statistics”  We must not fool ourselves about what we are doing … of all the wrong answers we have to choose from, this one is the best  We have no right to expect a statistician to endorse such mischief

41 41 The Apprentice Should Beware of … The Management Prior Being bamboozled by other people’s statistics “The only right way to do this is X [insert statistical method here]” Being seduced by complexity

42 42 Statistics in the Workplace: Musings of a Sorcerer's Apprentice Presentation to USC Stat Club March 26, 2009 Main message  A degree in statistics is a “Swiss Army Knife” that is very useful in any endeavor where data are collected and analyzed  Everyone in the room should become a health physicist (I had no takers)

Download ppt "1 Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics Savannah River Chapter of the Health Physics Society Aiken,"

Similar presentations

Ads by Google