Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.

Experimentation in Computer Science (Part 2)

Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process

E Experiment Process: Phases Experiment Definition Experiment Planning Experiment Operation Analysis & Interpretation Presentation & Package Conclusions Experiment Idea Experiment Process

Experiment Process: Phases Defined  Experiment Idea: ask the right question (insight)  Experiment Definition: ask the question right  Experiment Planning: design experiment to answer question  Experiment Operation: collect metrics  Analysis and Interpretation: statistically evaluate and determine practical consequences  Presentation: disseminate results

Experiment Definition: Overview  Formulate experiment idea -- ask the right question  Define goals -- why conduct the experiment  State research questions:  Descriptive – what percentage of developers use OO?  Relational – what percentage of experienced / novice developers use OO?  Causal – what’s the average productivity of developers using OO versus developers using non-OO?

7 Experiment Definition: Overview – Example  How do test suite size and test case composition affect the costs and benefits of web testing methodologies?

9 Experiment Planning: Overview Context Selection Hypothesis Formulation Variables Selection Selection of Subjects Experiment Design Experiment Operation Experiment Definition Experiment Planning Instrumen- tation Validity Evaluation

10 Experiment Planning: Context Selection  Context: environment and personnel:  Dimensions include:  off-line vs on-line  student vs professional personnel  toy vs real problems  specific vs general software engineering domain  Selection drivers: validity vs cost

11 Experiment Planning: Hypothesis Formulation  Hypothesis: A formal statement related to a research question  Forms the basis for statistical analysis of results through hypothesis testing  Data collected in the experiment is used to, if possible, reject the hypothesis

12 Experiment Planning: Hypothesis Formulation  There are two hypotheses for each question of interest:  Null Hypothesis, H 0 : Describes the state in which the prediction does not hold.  Alternative Hypothesis, H a, H 1, etc : Describes the prediction we believe will be supported by evidence.  Goal of experiment is to reject H 0 with as high significance as possible; this rejection then implies acceptance of the alternative hypothesis

13 Experiment Planning: Hypothesis Formulation  Hypothesis testing involves risks  Type-I-error: The probability of rejecting a true null hypothesis. In this case we infer a pattern or relationship that does not exist.  Type-II-error: The probability of not rejecting a false null hypothesis. In this case we fail to identify a pattern or relationship that does exist.  Power of a statistical test: The probability that the test will reveal a true pattern if the null hypothesis is false (1 – P(type-II-error))

14 Experiment Planning: Variable Selection  Types of Variables to Select:  Independent: manipulated by investigator or nature  Dependent: affected by changes in Independent  Also Select:  Measures and measurement scales  Ranges for variables  Specific levels of independent variables to be used

15 Experiment Planning: Selection of Subjects/Objects  Selection process strongly affects ability to generalize results  Process for selecting subjects/objects:  Identify population U  Draw a sample from U using a sampling technique

16 Experiment Planning: Selection of Subjects/Objects  Probability sampling:  Simple random: randomly select from U  Systematic random:select first subject from U at random, then select every nth after that  Stratified random: divide U into strata following a known distribution, then apply random within strata  Non-probability sampling:  Convenience: select the nearest, most convenient  Quota: used to get subjects from various elements of a population; convenience is used for each element

17 Experiment Planning: Selection of Subjects/Objects  Larger sample sizes result in lower error  If population has large variability, larger sample size is needed  Data analysis methods may influence choice of sample size  However: higher sample size implies higher cost  Hence, we want a sample as small as possible, but large enough so that we can generalize!

Experiment Planning: Experiment Design - Principles  Randomization. Statistical methods require that observations be made from independent random variables; applies to subjects, objects, treatments.  Blocking. Given a factor that may affect results but that we aren’t interested in; we block subjects, objects, or techniques w.r.t. that factor, and analyze blocks independently (e.g, program in TSE paper).  Balancing. Assign treatments such that each has an equal number of subjects; not essential, but simplifies and strengthens statistical analysis

Experiment Planning: Experiment Design - Design Types  We will consider several, suitable for experiments with:  One factor with two treatments  One factor with more than two treatments  Two factors with two treatments  More than two factors each with two treatments  Notation:   i : the mean of the dependent variable for treatment i

Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts Design type: completely randomized Description: simple means comparison Example hypothesis: H 0 :  1 =  2 H 1 :  1 <>  2,  1 >  2 or  1 <  2, Examples of analyses: T-test Mann-Whitney SubjectsTrtmt 1Trtmt 2 1X 2X 3X 4X 5X 6X

Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts Design type: completely randomized Description: simple means comparison Example hypothesis: H 0 :  1 =  2 H 1 :  1 <>  2,  1 >  2 or  1 <  2, Examples of analyses: T-test Mann-Whitney EXAMPLE: Investigate whether humans using a new testing method detect faults better than humans using a previous method. The factor is the method, treatments are old and new methods, dependent variable could be number of faults found.

Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts Design type: paired comparison Description: compare differences between techniques more precisely; beware learning effects Example hypothesis: H 0 :  d = 0 (  d = mean of diff) H 1 :  d <>0,  d >0, or  d <0 Examples of analyses: Paired t-test, Sign test, Wilcoxon SubjectsTrtmt 1Trtmt 2 121 212 321 421 512 612

Experiment Planning: Experiment Design – 1 Fctr, 2 Trtmts EXAMPLE: Investigate whether a new testing criterion facilitates fault detection better than a previous criterion. The factor is the criterion, treatments are use of old and new criteria, dependent variable could be number of faults found. Design type: paired comparison Description: compare differences between techniques more precisely; beware learning effects Example hypothesis: H 0 :  d = 0 (  d = mean of diff) H 1 :  d <>0,  d >0, or  d <0 Examples of analyses: Paired t-test, Sign test, Wilcoxon

Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts Design type: completely randomized Description: means comparison Example hypothesis: H 0 :  1 =  2 =  3 =…=  a H 1 :  i <>  j for some (i,j) Examples of analyses: ANOVA Kruskal-Wallis SubjectsTrtmt 1Trtmt 2Trtmt 3 1X 2X 3X 4X 5X 6X

Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts Design type: completely randomized Description: means comparison Example hypothesis: H 0 :  1 =  2 =  3 =…=  a H 1 :  i <>  j for some (i,j) Examples of analyses: ANOVA Kruskal-Wallis EXAMPLE: Investigate whether humans using a new testing method detect faults better than humans using two previous methods. The factor is the method, treatments are new and two old methods, dependent variable could be number of faults found.

Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts Design type: randomized complete block Description: compare diffs; esp. if large variability between subjects Example hypothesis: H 0 :  1 =  2 =  3 =…=  a H 1 :  i <>  j for some (i,j) Examples of analyses: ANOVA Kruskal-Wallis SubjectsTrtmt 1Trtmt 2Trtmt 3 1132 2312 3231 4213 5321 6123

Experiment Planning: Experiment Design – 1 Fctr, 3+ Trtmts Design type: randomized complete block Description: compare diffs; esp. if large variability between subjects Example hypothesis: H 0 :  1 =  2 =  3 =…=  a H 1 :  i <>  j for some (i,j) Examples of analyses: ANOVA, Kruskal-Wallis EXAMPLE: Investigate whether a new testing criterion facilitates fault detection better than two previous criteria. The factor is the criterion, treatments are use of new and old criteria, dependent variable could be number of faults found.

Experiment Planning: Experiment Design – 2 Fctrs, 2 Trtmts Design type: 2*2 factorial, 2 treatments Three hypotheses Effect of treatment Ai Effect of treatment Bi Effect of interaction between Ai and Bi Factor A Trtmt A1Trtmt A2 Factor BTrtmt B1Subject 4,6Subject 1,7 Trtmt B2Subject 2,3Subject 5,8 Example hypothesis: H 0 :  1 =  2 = 0 H 1 : at least one  i <>  j 0 (Hypothesis instantiated for each treatment and for interaction) Examples of analyses: ANOVA

Experiment Planning: Experiment Design – 2 Fctrs, 2 Trtmts Example: Investigate regression testability of code using retest-all and regression test selection, in the case where tests are coarse-grained and the case where they are fine- grained. Factor A is technique, Factor B is granularity. Design is 2*2 factorial because both factors have 2 treatments and every combination of treatments occurs Design type: 2*2 factorial, 2 treatments Three hypotheses Effect of treatment Ai Effect of treatment Bi Effect of interaction between Ai and Bi Example hypothesis: H 0 :  1 =  2 = 0 H 1 : at least one  i <>  j 0 (Hypothesis instantiated for each treatment and for interaction) Examples of analyses: ANOVA

Experiment Planning: Experiment Design – k Fctrs, 2 Trtmts Given k factors, results can depend on each factor or interactions among them. 2 k design has k factors with two treatments, tests all combinations Hypotheses and analyses are the same as for 2*2 factorial Fctr AFctr BFctr CSbjcts A1B1C12, 3 A2B1C11, 13 A1B2C15, 6 A2B2C110, 16 A1B1C27, 15 A2B1C28, 11 A1B2C24, 9 A2B2C212, 14

Experiment Planning: Experiment Design – k Fctrs, 2 Trtmts As factors grow, expense grows. If high-order interactions can be assumed to be negligible, it is possible to run a fraction of complete factorial This approach may be used, in particular, for exploratory studies, to identify factors having large effects Strengthen results by running other fractions in sequence Fctr AFctr BFctr CSbjcts A2B1C12, 3 A1B2C11, 8 A1B1C25, 6 A2B2C24, 7 One-half fractional factorial design of the 2 k factorial design Select combinations s.t. if one factor is removed, remaining design is full 2 k-1

Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.

Similar presentations

Presentation on theme: "Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.

Similar presentations

Presentation on theme: "Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process."— Presentation transcript:

Similar presentations

About project

Feedback