Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

Significance Testing Chapter 13 Victor Katch Kinesiology.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Statement of the Problem Goal Establishes Setting of the Problem hypothesis Additional information to comprehend fully the meaning of the problem scopedefinitionsassumptions.
Chapter 2 The Research Process: Coming to Terms.
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE © 2012 The McGraw-Hill Companies, Inc.
Today Concepts underlying inferential statistics
Richard M. Jacobs, OSA, Ph.D.
Inferential Statistics
Choosing Statistical Procedures
Chapter 2: The Research Enterprise in Psychology
Chapter 2: The Research Enterprise in Psychology
Chapter 4 Hypothesis Testing, Power, and Control: A Review of the Basics.
Estimation and Hypothesis Testing Now the real fun begins.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Statistical Analysis Statistical Analysis
Chapter 1: Introduction to Statistics
Extension to ANOVA From t to F. Review Comparisons of samples involving t-tests are restricted to the two-sample domain Comparisons of samples involving.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
The Argument for Using Statistics Weighing the Evidence Statistical Inference: An Overview Applying Statistical Inference: An Example Going Beyond Testing.
Chapter 8 Introduction to Hypothesis Testing
The Research Enterprise in Psychology. The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Statistics (cont.) Psych 231: Research Methods in Psychology.
Chapter 7 Experimental Design: Independent Groups Design.
Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
Experimentation in Computer Science (Part 1). Outline  Empirical Strategies  Measurement  Experiment Process.
Statistics (cont.) Psych 231: Research Methods in Psychology.
GNRS 713 Week 3 T-tests. StatisticsDescriptiveInferentialCorrelational Relationships GeneralizingOrganizing, summarising & describing data Significance.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
IT 와 인간의 만남 KAIST 지식서비스공학과 Experimental Research KSE966/986 Seminar Uichin Lee Sept. 21, 2012.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Review of Research Methods. Overview of the Research Process I. Develop a research question II. Develop a hypothesis III. Choose a research design IV.
Academic Research Academic Research Dr Kishor Bhanushali M
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
Question paper 1997.
©2010 John Wiley and Sons Chapter 2 Research Methods in Human-Computer Interaction Chapter 2- Experimental Research.
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Statement of the Problem Goal Establishes Setting of the Problem hypothesis Additional information to comprehend fully the meaning of the problem scopedefinitionsassumptions.
Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.
Chapter 10 Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law:
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
How Psychologists Do Research Chapter 2. How Psychologists Do Research What makes psychological research scientific? Research Methods Descriptive studies.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Inferential Statistics Psych 231: Research Methods in Psychology.
© 2009 Pearson Prentice Hall, Salkind. Chapter 2 The Research Process: Coming to Terms.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.
Chapter 2: The Research Enterprise in Psychology
Psych 231: Research Methods in Psychology
Understanding Results
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Inferential Statistics
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
CS 594: Empirical Methods in HCC Experimental Research in HCI (Part 1)
Rest of lecture 4 (Chapter 5: pg ) Statistical Inferences
Presentation transcript:

Experimentation in Computer Science (Part 2)

Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process

E Experiment Process: Phases Experiment Definition Experiment Planning Experiment Operation Analysis & Interpretation Presentation & Package Conclusions Experiment Idea Experiment Process

Experiment Process: Phases Defined  Experiment Idea: ask the right question (insight)  Experiment Definition: ask the question right  Experiment Planning: design experiment to answer question  Experiment Operation: collect metrics  Analysis and Interpretation: statistically evaluate and determine practical consequences  Presentation: disseminate results

E Experiment Process: Phases Experiment Definition Experiment Planning Experiment Operation Analysis & Interpretation Presentation & Package Conclusions Experiment Idea Experiment Process

Experiment Definition: Overview  Formulate experiment idea -- ask the right question  Define goals -- why conduct the experiment  State research questions:  Descriptive – what percentage of developers use OO?  Relational – what percentage of experienced / novice developers use OO?  Causal – what’s the average productivity of developers using OO versus developers using non-OO?

7 Experiment Definition: Overview – Example  How do test suite size and test case composition affect the costs and benefits of web testing methodologies?

E Experiment Process: Phases Experiment Definition Experiment Planning Experiment Operation Analysis & Interpretation Presentation & Package Conclusions Experiment Idea Experiment Process

9 Experiment Planning: Overview Context Selection Hypothesis Formulation Variables Selection Selection of Subjects Experiment Design Experiment Operation Experiment Definition Experiment Planning Instrumen- tation Validity Evaluation

10 Experiment Planning: Context Selection  Context: environment and personnel:  Dimensions include:  off-line vs on-line  student vs professional personnel  toy vs real problems  specific vs general software engineering domain  Selection drivers: validity vs cost

11 Experiment Planning: Hypothesis Formulation  Hypothesis: A formal statement related to a research question  Forms the basis for statistical analysis of results through hypothesis testing  Data collected in the experiment is used to, if possible, reject the hypothesis

12 Experiment Planning: Hypothesis Formulation  There are two hypotheses for each question of interest:  Null Hypothesis, H 0 : Describes the state in which the prediction does not hold.  Alternative Hypothesis, H a, H 1, etc : Describes the prediction we believe will be supported by evidence.  Goal of experiment is to reject H 0 with as high significance as possible; this rejection then implies acceptance of the alternative hypothesis

13 Experiment Planning: Hypothesis Formulation  Hypothesis testing involves risks  Type-I-error: The probability of rejecting a true null hypothesis. In this case we infer a pattern or relationship that does not exist.  Type-II-error: The probability of not rejecting a false null hypothesis. In this case we fail to identify a pattern or relationship that does exist.  Power of a statistical test: The probability that the test will reveal a true pattern if the null hypothesis is false (1 – P(type-II-error))

14 Experiment Planning: Variable Selection  Types of Variables to Select:  Independent: manipulated by investigator or nature  Dependent: affected by changes in Independent  Also Select:  Measures and measurement scales  Ranges for variables  Specific levels of independent variables to be used

15 Experiment Planning: Selection of Subjects/Objects  Selection process strongly affects ability to generalize results  Process for selecting subjects/objects:  Identify population U  Draw a sample from U using a sampling technique

16 Experiment Planning: Selection of Subjects/Objects  Probability sampling:  Simple random: randomly select from U  Systematic random:select first subject from U at random, then select every nth after that  Stratified random: divide U into strata following a known distribution, then apply random within strata  Non-probability sampling:  Convenience: select the nearest, most convenient  Quota: used to get subjects from various elements of a population; convenience is used for each element

17 Experiment Planning: Selection of Subjects/Objects  Larger sample sizes result in lower error  If population has large variability, larger sample size is needed  Data analysis methods may influence choice of sample size  However: higher sample size implies higher cost  Hence, we want a sample as small as possible, but large enough so that we can generalize!

Experiment Planning: Experiment Design - Principles  Randomization. Statistical methods require that observations be made from independent random variables; applies to subjects, objects, treatments.  Blocking. Given a factor that may affect results but that we aren’t interested in; we block subjects, objects, or techniques w.r.t. that factor, and analyze blocks independently (e.g, program in TSE paper).  Balancing. Assign treatments such that each has an equal number of subjects; not essential, but simplifies and strengthens statistical analysis

Experiment Planning: Experiment Design - Design Types  We will consider several, suitable for experiments with:  One factor with two treatments  One factor with more than two treatments  Two factors with two treatments  More than two factors each with two treatments  Notation:   i : the mean of the dependent variable for treatment i

Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts Design type: completely randomized Description: simple means comparison Example hypothesis: H 0 :  1 =  2 H 1 :  1 <>  2,  1 >  2 or  1 <  2, Examples of analyses: T-test Mann-Whitney SubjectsTrtmt 1Trtmt 2 1X 2X 3X 4X 5X 6X

Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts Design type: completely randomized Description: simple means comparison Example hypothesis: H 0 :  1 =  2 H 1 :  1 <>  2,  1 >  2 or  1 <  2, Examples of analyses: T-test Mann-Whitney EXAMPLE: Investigate whether humans using a new testing method detect faults better than humans using a previous method. The factor is the method, treatments are old and new methods, dependent variable could be number of faults found.

Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts Design type: paired comparison Description: compare differences between techniques more precisely; beware learning effects Example hypothesis: H 0 :  d = 0 (  d = mean of diff) H 1 :  d <>0,  d >0, or  d <0 Examples of analyses: Paired t-test, Sign test, Wilcoxon SubjectsTrtmt 1Trtmt

Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts EXAMPLE: Investigate whether a new testing criterion facilitates fault detection better than a previous criterion. The factor is the criterion, treatments are use of old and new criteria, dependent variable could be number of faults found. Design type: paired comparison Description: compare differences between techniques more precisely; beware learning effects Example hypothesis: H 0 :  d = 0 (  d = mean of diff) H 1 :  d <>0,  d >0, or  d <0 Examples of analyses: Paired t-test, Sign test, Wilcoxon

Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts Design type: completely randomized Description: means comparison Example hypothesis: H 0 :  1 =  2 =  3 =…=  a H 1 :  i <>  j for some (i,j) Examples of analyses: ANOVA Kruskal-Wallis SubjectsTrtmt 1Trtmt 2Trtmt 3 1X 2X 3X 4X 5X 6X

Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts Design type: completely randomized Description: means comparison Example hypothesis: H 0 :  1 =  2 =  3 =…=  a H 1 :  i <>  j for some (i,j) Examples of analyses: ANOVA Kruskal-Wallis EXAMPLE: Investigate whether humans using a new testing method detect faults better than humans using two previous methods. The factor is the method, treatments are new and two old methods, dependent variable could be number of faults found.

Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts Design type: randomized complete block Description: compare diffs; esp. if large variability between subjects Example hypothesis: H 0 :  1 =  2 =  3 =…=  a H 1 :  i <>  j for some (i,j) Examples of analyses: ANOVA Kruskal-Wallis SubjectsTrtmt 1Trtmt 2Trtmt

Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts Design type: randomized complete block Description: compare diffs; esp. if large variability between subjects Example hypothesis: H 0 :  1 =  2 =  3 =…=  a H 1 :  i <>  j for some (i,j) Examples of analyses: ANOVA, Kruskal-Wallis EXAMPLE: Investigate whether a new testing criterion facilitates fault detection better than two previous criteria. The factor is the criterion, treatments are use of new and old criteria, dependent variable could be number of faults found.

Experiment Planning: Experiment Design – 2 Fctrs, 2 Trtmts Design type: 2*2 factorial, 2 treatments Three hypotheses Effect of treatment Ai Effect of treatment Bi Effect of interaction between Ai and Bi Factor A Trtmt A1Trtmt A2 Factor BTrtmt B1Subject 4,6Subject 1,7 Trtmt B2Subject 2,3Subject 5,8 Example hypothesis: H 0 :  1 =  2 = 0 H 1 : at least one  i <>  j 0 (Hypothesis instantiated for each treatment and for interaction) Examples of analyses: ANOVA

Experiment Planning: Experiment Design – 2 Fctrs, 2 Trtmts Example: Investigate regression testability of code using retest-all and regression test selection, in the case where tests are coarse-grained and the case where they are fine- grained. Factor A is technique, Factor B is granularity. Design is 2*2 factorial because both factors have 2 treatments and every combination of treatments occurs Design type: 2*2 factorial, 2 treatments Three hypotheses Effect of treatment Ai Effect of treatment Bi Effect of interaction between Ai and Bi Example hypothesis: H 0 :  1 =  2 = 0 H 1 : at least one  i <>  j 0 (Hypothesis instantiated for each treatment and for interaction) Examples of analyses: ANOVA

Experiment Planning: Experiment Design – k Fctrs, 2 Trtmts Given k factors, results can depend on each factor or interactions among them. 2 k design has k factors with two treatments, tests all combinations Hypotheses and analyses are the same as for 2*2 factorial Fctr AFctr BFctr CSbjcts A1B1C12, 3 A2B1C11, 13 A1B2C15, 6 A2B2C110, 16 A1B1C27, 15 A2B1C28, 11 A1B2C24, 9 A2B2C212, 14

Experiment Planning: Experiment Design – k Fctrs, 2 Trtmts As factors grow, expense grows. If high-order interactions can be assumed to be negligible, it is possible to run a fraction of complete factorial This approach may be used, in particular, for exploratory studies, to identify factors having large effects Strengthen results by running other fractions in sequence Fctr AFctr BFctr CSbjcts A2B1C12, 3 A1B2C11, 8 A1B1C25, 6 A2B2C24, 7 One-half fractional factorial design of the 2 k factorial design Select combinations s.t. if one factor is removed, remaining design is full 2 k-1