Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Methods in Computer Science The Basis for Experiment Design for Hypothesis Testing Ido Dagan.

Similar presentations


Presentation on theme: "Statistical Methods in Computer Science The Basis for Experiment Design for Hypothesis Testing Ido Dagan."— Presentation transcript:

1 Statistical Methods in Computer Science The Basis for Experiment Design for Hypothesis Testing Ido Dagan

2 Reminders: 1.Instructions for participating in the experiment are on the course website 2. Excel Recitations: Wednesday – in computer room 604/203 Thursday – same room next week, no class this week ** your BIU-CS login should be active **

3 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 3 Experimental Lifecycle Model/Theory Hypothesis Experiment Analysis

4 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 4 Proving a Theory? Methods of proving a proposition An experiment supports it We can mathematically prove it Some propositions cannot be verified empirically: “This compiler has linear run-time” Infinite possible inputs --> cannot prove empirically But they may still be disproved: e.g., code that causes the compiler to run non-linearly

5 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 5 Karl Popper's Philosophy of Science Popper advanced a particular philosophy of science: Falsifiability For a theory to be considered scientific, it must be falsifiable There must be some way to refute it, in principle Not falsifiable Not scientific Examples: “All crows are black” falsifiable by finding a white crow “Compile in linear time” falsifiable by non-linear performance Theory tested on its predictions

6 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 6 Proving by disproving... Platt (“Strong Inference”, 1964) offers a specific method: 1)Devise alternative hypotheses for observations 2)Devise experiment(s) allowing elimination of hypotheses 3)Carry out experiments to obtain a clean result 4)Go to 1. The idea is to eliminate hypotheses, by rejecting them

7 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 7 Forming Hypotheses So, to support theory X, we: 1)Construct falsifiability hypotheses X 1,.... X n,.... 2)Systematically experiment to disprove X, by proving X i 3)If all falsification hypotheses eliminated, then this lends support to the theory Note that future falsification hypotheses may be formed Theory must continue to hold against “attacks” Popper: Scientific evolution, “survival of the fittest theory” E.g. Newton’s theory How does this view hold in computer science?

8 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 8 Forming Hypotheses in CS (1) Carefully identify the theoretical object we are studying: e.g., “the relation between input-size and run-time is linear” e.g., “the display improves user performance” (2) Identify falsification hypothesis (null hypothesis) H 0 e.g., “there is an input-size for which run-time is non-linear” e.g., “the display will have no effect on user performance” (3) Now, experiment to eliminate H 0

9 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 9 The Basics of Experiment Design Experiments identify a relation between variables X, Y,... Simple experiments: Provide indication of a relation Better/worse, linear or non-linear,.... Advanced experiments: help identify causes, interactions Linear in input size but constant factor depends on type of data

10 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 10 Types of Experiments and Variables Manipulation experiments Manipulate (= set value of) independent variables (input size) Observe (measure value of) dependent variables (run time) Observation experiments Observe predictor variables (person height) Observe response variables (running speed) Also system run time – if observing system in actual use Other variables: Endogenous: On causal path between independent and dependent Exogenous: Other variables influencing dependent variables

11 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 11 An example of observation experiment Theory: Gender affects score performance Falsifying hypothesis: Gender does not affect performance I.e. Men & women perform the same Cannot use manipulation experiments Cannot control gender Must use observation experiments

12 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 12 An example observation experiment (ala “Empirical methods in AI”, Cohen 1995) # Siblings: 2 Mother: artist Gender: Male Height: 145cm Teacher's attitude Child confidence Test score: 650 # Siblings: 3 Mother: Doctor Gender: Female Height: 135cm Teacher's attitude Child confidence Test score: 720 Independent (Predictor) Variables

13 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 13 An example observation experiment (ala “Empirical methods in AI”, Cohen 1995) # Siblings: 2 Mother: artist Gender: Male Height: 145cm Teacher's attitude Child confidence Test score: 650 # Siblings: 3 Mother: Doctor Gender: Female Height: 135cm Teacher's attitude Child confidence Test score: 720 Dependent (Response) Variables

14 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 14 An example observation experiment (ala “Empirical methods in AI”, Cohen 1995) # Siblings: 2 Mother: artist Gender: Male Height: 145cm Teacher's attitude Child confidence Test score: 650 # Siblings: 3 Mother: Doctor Gender: Female Height: 135cm Teacher's attitude Child confidence Test score: 720 Endogenous Variables

15 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 15 An example observation experiment (ala “Empirical methods in AI”, Cohen 1995) # Siblings: 2 Mother: artist Gender: Male Height: 145cm Teacher's attitude Child confidence Test score: 650 # Siblings: 3 Mother: Doctor Gender: Female Height: 135cm Teacher's attitude Child confidence Test score: 720 Exogenous Variables

16 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 16 Experiment Design: Introduction Different experiment types explore different hypotheses For instance, a very simple design: treatment experiment Sometimes known as a lesion study treatment Ind 1 Ex 1 Ex 2.... Ex n Dep 1 control Not(Ind 1 ) Ex 1 Ex 2.... Ex n Dep 2 Treatment condition: Independent variable set to “with treatment” Control condition: Independent var set to “no treatment” Populations are “identical” in all other variables Determine relation of categorical var V 0 and the dependent var Variables: V 0 V1V1 V2V2... VnVn Dependent Variable

17 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 17 Single-Factor Treatment Experiments A generalization of treatment experiments Allow comparison of different conditions treatment 1 Ind 1 Ex 1 Ex 2.... Ex n Dep 1 treatment 2 Ind 2 Ex 1 Ex 2.... Ex n Dep 2 [control Not(Ind) Ex 1 Ex 2.... Ex n Dep 3 ] Compare performance of algorithm A to B to C.... Control condition: Optional (e.g., to establish baseline) V0V0 V1V1 V2V2 VnVn Dependent Variable

18 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 18 Careful ! An effect on the dependent variable may not be as expected Example: An experiment Hypothesis: fly's ear is on its wings Fly with two wings. Make loud noise. Observe flight. Fly with one wing. Make loud noise. No flight. Conclusion: Fly with only one wing cannot hear! What's going on here? First, interpretation by the experimenter But also, lack of sufficient falsifiability: There are other possible explanations for why fly wouldn't fly – another variable (wing) affecting the dependent variable (flying)

19 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 19 Controlling for other factors Often, we cannot manipulate all exogenous variables Then, we need to make sure they are sampled randomly Randomization averages out their effect This can be difficult e.g.,, suppose we are trying to relate gender and math We control for effect of # of siblings by random sampling But # of siblings may be related to gender: Parents continue to have children hoping for a boy (Beal 1994) Thus # of siblings tied with gender Must separate results based on # of siblings

20 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 20 Factorial Experiment Designs Every combination of factor values is sampled – Hope is to exclude or reveal interactions This creates a combinatorial number of experiments – N factors, k values each = k N combinations Strategies for eliminating values: – Merge values, categories. Skip values. – Focus on extremes, to get a general trend But may hide behavior at intermediate values

21 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 21 Tips for Factorial Experiments For “numerical” variables, 2 value ranges are not enough Don't give a good sense of the function relating variables. Measure, measure, measure. Piggybacking measurements on planned experiments: cheaper than re-running experiments

22 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 22 Experiment Validity Type of validity: Internal and External validity Internal validity: Experiment shows relationship (independent causes dependent) External validity: Degree to which results generalize to other conditions Threats: uncontrolled conditions threatening validity

23 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 23 Internal validity threats: Examples Order effects Practice effects in human or animal test subjects E.g. user performance improves in user interface tasks Solution: randomize order of presentation to subjects Bug or side-effects in testing system leaves system “unclean” for next trial – need to “clean” system between experiments If treatment/control given in two different orders E.g. run with/without new algorithm operating, for same users Order may be good for treatment, bad for control (or vice versa) Solution: counter-balancing (all possible orders) Demand effects Experimenter influences subject e.g., guiding subjects Confounding effects – variable relations aren’t clear See “fly with no wings cannot hear”

24 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 24 External threats to validity Outline: Sampling bias: Non-representative samples e.g., non-representative external factors Floor and ceiling effects Problems tested too hard, too easy Regression effects Results have no way to go but up or down Solution approach: Run pilot experiments

25 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 25 Sampling Bias Setting prefers measuring specific values over others For instance: “Random” manual selection of mice from cage for experiment Specific values: slow, doesn’t bite (not aggressive), … Including results that were found by some deadline Solution: Detect, and remove e.g., by visualization, looking for non-normal distributions e.g., surprising distribution of dependent data, for different values of independent variable.

26 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 26 Baselines: Floor and Ceiling Effects How do we know A is good? Bad? Maybe the problems are too simple? Too hard? For example New machine learning algorithm has 95% accuracy Is this good? Controlling for Floor/Ceiling Establish baselines Show that a “silly” approach achieves close result Comparison to strawman (easy), ironman (hard) May be misleading if not chosen appropriately

27 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 27 Regression Effects General phenomenon: “Regression towards the mean” Repeated measurement converges towards mean values Example threat: Run a program on 100 different inputs Problems 6, 14, 15 get a very low score We now fix the problem that affected only these inputs, and want to re-test If chance has anything to do with scoring, then must re-run all Why? Scores on 6, 14, 15 has no where to go but up. So re-running these problems will show improvement by chance Solution: Re-run complete tests, or sample conditions uniformly

28 Empirical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 28 Summary Defensive thinking If I were trying to disprove the claim, what would I do Then think ways to counter any possible attack on claim Strong Inference, Popper's falsification ideas Science moves by disproving theories (empirically) Experiment design: Ideal independent variables: easy to manipulate Ideal dependent variables: measurable, sensitive, and meaningful Carefully think through threats


Download ppt "Statistical Methods in Computer Science The Basis for Experiment Design for Hypothesis Testing Ido Dagan."

Similar presentations


Ads by Google