1-1999 SINTEF Telecom and Informatics EuroSPI’99 Workshop on Data Analysis Popular Pitfalls of Data Analysis Tore Dybå, M.Sc. Research Scientist, SINTEF.

Slides:



Advertisements
Similar presentations
Chapter 2 The Process of Experimentation
Advertisements

Chapter 8 How Good is the Evidence: personal experience, testimonials & appeals to authority?
Chapter 10.  Real life problems are usually different than just estimation of population statistics.  We try on the basis of experimental evidence Whether.
Chapter 16: Inference in Practice STAT Connecting Chapter 16 to our Current Knowledge of Statistics ▸ Chapter 14 equipped you with the basic tools.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter 7: Statistical Analysis Evaluating the Data.
Common Problems in Writing Statistical Plan of Clinical Trial Protocol Liying XU CCTER CUHK.
S S (5.1) RTI, JAIPUR1 STATISTICAL SAMPLING Presented By RTI, JAIPUR.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Introduction, Acquiring Knowledge, and the Scientific Method
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Introduction to the design (and analysis) of experiments James M. Curran Department of Statistics, University of Auckland
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 14.
Inferential Statistics
RESEARCH DESIGN.
AM Recitation 2/10/11.
Respected Professor Kihyeon Cho
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 24 Statistical Inference: Conclusion.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Technical Adequacy Session One Part Three.
Understanding Statistics
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Undergraduate Dissertation Preparation – Research Strategy.
PARAMETRIC STATISTICAL INFERENCE
CHAPTER 18: Inference about a Population Mean
Experimental Design making causal inferences Richard Lambert, Ph.D.
Significance Toolbox 1) Identify the population of interest (What is the topic of discussion?) and parameter (mean, standard deviation, probability) you.
Scientific Inquiry & Skills
Chapter 1 Measurement, Statistics, and Research. What is Measurement? Measurement is the process of comparing a value to a standard Measurement is the.
CHAPTER 16: Inference in Practice ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
No criminal on the run The concept of test of significance FETP India.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Tahir Mahmood Lecturer Department of Statistics. Outlines: E xplain the role of sampling in the research process D istinguish between probability and.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Misuses of Quantitative Research V. Darleen Opfer.
SP_IRS Introduction to Research in Special and Inclusive Education(Autumn 2015) Lecture 1: Introduction Lecturer: Mr. S. Kumar.
Chapter 16: Inference in Practice STAT Connecting Chapter 16 to our Current Knowledge of Statistics ▸ Chapter 14 equipped you with the basic tools.
How to Avoid the Lies and Damned Lies: Pitfalls of Data Analysis Clay Helberg Special Topics in Marketing Research Dr. Charles Trappey Summarized by Kevin.
DESIGNING AN EXPERIMENT.  Scientific Inquiry – the process of gathering evidence about the natural world and giving explanations based on evidence. DESIGNING.
Chapter 3: Statistical Significance Testing Warner (2007). Applied statistics: From bivariate through multivariate. Sage Publications, Inc.
Issues of Quantitative Research: Part 1 V. Darleen Opfer.
BPS - 5th Ed. Chapter 151 Thinking about Inference.
Reasoning in Psychology Using Statistics Psychology
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
Chapter 3 Surveys and Sampling © 2010 Pearson Education 1.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Essential Statistics Chapter 171 Two-Sample Problems.
AP STATISTICS LESSON (DAY 1) MAKING SENSE OF SATISTICAL SIGNIFICANCE.
1 Prepared by: Laila al-Hasan. 1. Definition of research 2. Characteristics of research 3. Types of research 4. Objectives 5. Inquiry mode 2 Prepared.
As a data user, it is imperative that you understand how the data has been generated and processed…
Computing Honours Project (COMP10034) Lecture 4 Primary Research.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
IE 102 Lecture 6 Critical Thinking.
Section 2: Science as a Process
Research & Writing in CJ
Statistical Data Analysis
Lecture 1: Introduction Lecturer: Mr. S. Kumar
Comparing Two Proportions
Significance and t testing
Comparing Two Proportions
Statistical Data Analysis
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 16: Inference in Practice
DESIGN OF EXPERIMENTS by R. C. Baker
Biological Science Applications in Agriculture
Presentation transcript:

SINTEF Telecom and Informatics EuroSPI’99 Workshop on Data Analysis Popular Pitfalls of Data Analysis Tore Dybå, M.Sc. Research Scientist, SINTEF Telecom & Informatics Research Fellow, Norwegian University of Science and Technology

SINTEF Telecom and Informatics The Problem with Statistics Statistics requires the ability to consider things from a probabilistic perspective, employing concepts such as: “confidence”, “reliability”, and “significance”. It is not a mathematical method for finding “the right answer”. We consider three broad classes of statistical pitfalls:  Sources of bias, which affect the external validity of our results  Errors in methodology, which can lead to inaccurate results  Problems with interpretation, or how the results are (mis)applied There are three kinds of lies: Lies, Damned Lies, and Statistics. - Benjamin Disraeli

SINTEF Telecom and Informatics Sources of Bias Statistical methods assist us in making inferences about a large group based on observations of a smaller subset of that group. To make legitimate conclusions about the population, two characteristics must be present within the sample:  Representative sampling: While this might be feasible for manufacturing processes, it is much more problematic for software processes.  Statistical assumptions: The validity of a statistical procedure depends on certain assumptions about the problem, and the robustness of the applied techniques to violations in these assumptions.

SINTEF Telecom and Informatics Errors in Methodology There are numerous ways of misapplying statistical techniques to real world problems. This can lead to invalid or inaccurate results. Three of the most common hazards are:  Statistical power: If there is little statistical power, you risk overlooking the effect you are attempting to discover.  Multiple comparisons: You need to consider the complexity and the number of factor combinations that need to be examined.  Measurement error: Most statistical models assume error free measurement. However, software processes are complex, making it difficult to measure precisely.

SINTEF Telecom and Informatics Problems with Interpretation (1) Generally, the reason for analyzing process data is to draw inferences that can guide decisions and actions. Drawing such inferences from data depends not only on using appropriate analytical methods and tools. Equally important is the understanding of the underlying nature of the data and the appropriateness of assump- tions about the conditions and environments in which the data were obtained. DataAnalysisInterpretation

SINTEF Telecom and Informatics Problems with Interpretation (2) Confusion over significance: “Significance” in the statistical sense is not the same as “significance” in the practical sense. Precision and accuracy: Precision refers to how finely an estimate is specified, whereas accuracy refers to how close an estimate is to the true value. Causality: Assessing causality is the reason of most statistical analysis. Evidence of causality involves:  Concomitant variation.  Time order of occurrence.  Absence of other casual factors.

SINTEF Telecom and Informatics How Do We Make Inferences? We live in a world of self-generating beliefs which remain largely untested. We adopt those beliefs because they are based on conclusions, which are inferred from what we observe, plus our past experiences. Our ability to make “valid” inferences and achieve results is eroded by our feelings that:  Our beliefs are the truth.  The truth is obvious.  Our beliefs are based on real data.  The data we select are the real data.

SINTEF Telecom and Informatics The Ladder of Inference Observable “data” and experiences I select “ data ” from what I observe I add meanings (cultural and personal) I make assumptions I draw conclusions I adopt beliefs about the world I take actions The reflexive loop (Our beliefs affect what data we select next time) Your beliefs are the basis for actions.

SINTEF Telecom and Informatics Choosing a Model Choosing a model - that is, how we believe the data to be related - will influence all later analyses. The decision to categorize one or more observations as “wrong” - or as “outliers” - is a relationship between data and model which cannot be decided upon without some form of confidence (belief) about the data and the model. Expert opinion can be used as a method of triangulation to get subjective data to further guide the decision on choosing the “right” model.

SINTEF Telecom and Informatics Alt 1: Point 1 has high confidence, points 2 and 3 have low confidence. Alt 2: Point 1 has low confidence, points 2 and 3 have high confidence Is the Model Correct? (1)

SINTEF Telecom and Informatics Is the Model Correct? (2) Alt 1: Model a has high confidence, model b has low confidence. Alt 2: Model a has low confidence, model b has high confidence. a b

SINTEF Telecom and Informatics Summary Be sure your sample is representative of the population in which you're interested. Be sure you understand the assumptions of your statistical procedures, and be sure they are satisfied. Be sure you have the right amount of statistical power. Be sure to use the best measurement tools available. Beware of multiple comparisons. Keep clear in your mind what you're trying to discover - look at magnitudes rather than p-values. Don't confuse precision with accuracy. Be sure you understand the conditions for causal inference. Be sure your models are accurate and reflect the data variation clearly. Errors using inadequate data are much less than those using no data at all. -Charles Babbage