Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human causal induction Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.

Similar presentations


Presentation on theme: "Human causal induction Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley."— Presentation transcript:

1 Human causal induction Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley

2 Three kinds of causal induction

3 contingency data

4 Dr. James Currie

5 Currie (1798) Medical Reports on, the Effects of Water, Cold and Warm, as a Remedy in Fevers and Febrile Diseases “The contagion spread rapidly and before its progress could be arrested, sixteen persons were affected of which two died. Of these sixteen, eight were under my care. On this occasion I used for the first time the affusion of cold water in the manner described by Dr. Wright. It was first tried in two cases... The effects corresponded exactly with those mentioned by him to have occurred in his own case and thus encouraged the remedy was employed in five other cases. It was repeated daily, and of these seven patients, the whole recovered.”

6 “Does the treatment cause recovery?” Recovered Died TreatedUntreated 7 0 7 2 (Currie, 1798)

7 Causation from contingencies “Does C cause E?” (rate on a scale from 0 to 100) E present (e + ) E absent (e - ) C present (c + ) C absent (c - ) a b c d

8 Three kinds of causal induction contingency data physical systems

9 Blicket detector (Gopnik, Sobel, and colleagues) See this? It’s a blicket machine. Blickets make it go. Let’s put this one on the machine. Oooh, it’s a blicket!

10 Three kinds of causal induction contingency data physical systems perceived causality

11 Michotte (1963)

12

13 Three kinds of causal induction contingency data physical systems perceived causality

14 Three kinds of causal induction contingency data physical systems perceived causality

15 Causation from contingencies “Does C cause E?” (rate on a scale from 0 to 100) E present (e + ) E absent (e - ) C present (c + ) C absent (c - ) a b c d

16 Two models of causal judgment Delta-P (Jenkins & Ward, 1965): Power PC (Cheng, 1997): Power

17 Recovered Died TreatedUntreated 7 0 7 2 = 1.00-0.78 = 0.22 Power = 0.22/0.22 = 1.00 P(e + |c + ) = 7/7 = 1.00 P(e + |c - ) = 7/9 = 0.78

18 Buehner and Cheng (1997) People PP Power 0.000.25 0.50 0.75 1.00 PP

19 Buehner and Cheng (1997) People PP Power Constant  P, changing judgments

20 Buehner and Cheng (1997) People PP Power Constant causal power, changing judgments

21 Buehner and Cheng (1997) People PP Power  P = 0, changing judgments

22 What is the computational problem?  P and causal power both seem to capture some part of causal induction, and miss something else… How can we formulate the problem of causal induction that people are trying to solve? –perhaps this way we can discover what’s missing A first step: a language for talking about causal relationships…

23 Bayesian networks Four random variables: X 1 coin toss produces heads X 2 pencil levitates X 3 friend has psychic powers X 4 friend has two-headed coin X1X1 X2X2 X3X3 X4X4 P(x4)P(x4)P(x3)P(x3) P(x2|x3)P(x2|x3)P(x 1 |x 3, x 4 ) Nodes: variables Links: dependency Each node has a conditional probability distribution Data: observations of x 1,..., x 4

24 Causal Bayesian networks Four random variables: X 1 coin toss produces heads X 2 pencil levitates X 3 friend has psychic powers X 4 friend has two-headed coin X1X1 X2X2 X3X3 X4X4 P(x4)P(x4)P(x3)P(x3) P(x2|x3)P(x2|x3)P(x 1 |x 3, x 4 ) Nodes: variables Links: causality Each node has a conditional probability distribution Data: observations of and interventions on x 1,..., x 4

25 Interventions Four random variables: X 1 coin toss produces heads X 2 pencil levitates X 3 friend has psychic powers X 4 friend has two-headed coin X1X1 X2X2 X3X3 X4X4 P(x4)P(x4)P(x3)P(x3) P(x2|x3)P(x2|x3)P(x 1 |x 3, x 4 ) hold down pencil X Cut all incoming links for the node that we intervene on Compute probabilities with “mutilated” Bayes net

26 Strength: how strong is a relationship? Structure: does a relationship exist? E B C E B C B B What is the computational problem?

27 Strength: how strong is a relationship? E B C E B C B B What is the computational problem?

28 Strength: how strong is a relationship? –requires defining nature of relationship E B C E B C B B What is the computational problem?

29 Parameterization Structures: h 1 = h 0 = Parameterization: E B C E B C C B 0 1 0 0 1 1 h 1 : P(E = 1 | C, B) h 0 : P(E = 1| C, B) p 00 p 10 p 01 p 11 p0p0p1p1p0p0p1p1 Generic (Bernoulli)

30 Parameterization Structures: h 1 = h 0 = Parameterization: E B C E B C w0w0 w1w1 w0w0 w 0, w 1 : strength parameters for B, C C B 0 1 0 0 1 1 h 1 : P(E = 1 | C, B) h 0 : P(E = 1| C, B) 0 w 1 w 0 w 1 + w 0 00w0w000w0w0 Linear

31 Parameterization Structures: h 1 = h 0 = Parameterization: E B C E B C w0w0 w1w1 w0w0 w 0, w 1 : strength parameters for B, C C B 0 1 0 0 1 1 h 1 : P(E = 1 | C, B) h 0 : P(E = 1| C, B) 0 w 1 w 0 w 1 + w 0 – w 1 w 0 00w0w000w0w0 “Noisy-OR”

32 Structure: does a relationship exist? E B C E B C B B What is the computational problem?

33 Approaches to structure learning Constraint-based: –dependency from statistical tests –deduce structure from dependencies E B C B (Pearl, 2000; Spirtes et al., 1993)

34 Approaches to structure learning E B C B Constraint-based: –dependency from statistical tests –deduce structure from dependencies (Pearl, 2000; Spirtes et al., 1993)

35 Approaches to structure learning E B C B Constraint-based: –dependency from statistical tests –deduce structure from dependencies (Pearl, 2000; Spirtes et al., 1993)

36 Approaches to structure learning E B C B Attempts to reduce inductive problem to deductive problem Constraint-based: –dependency from statistical tests –deduce structure from dependencies (Pearl, 2000; Spirtes et al., 1993)

37 Observationally equivalent aba b P(a)P(b|a) P(b)P(a|b) =

38 Common Cause Common Effect Two distinguishable networks a b c a b c P(a)P(b|a)P(c|a) P(a|b,c)P(c)P(b) =

39 Indistinguishable from observation a b c a b c a b c

40 Effect of intervention a b c a b c a b c x x x

41 a b c a b c a b c x x x …networks become distinguishable Effect of intervention

42 Approaches to structure learning E B C B Bayesian: –compute posterior probability of structures, given observed data E B C E B C P(h|d)  P(d|h) P(h) P(h 1 |data)P(h 0 |data) Constraint-based: –dependency from statistical tests –deduce structure from dependencies (Pearl, 2000; Spirtes et al., 1993) (Heckerman, 1998; Friedman, 1999)

43 Bayesian Occam’s Razor All possible data sets d P(d | h ) h 0 (no relationship) h 1 (relationship) For any model h,

44 Strength: how strong is a relationship? Structure: does a relationship exist? E B C w0w0 w1w1 E B C w0w0 B B Causal structure vs. causal strength

45 Causal strength Assume structure:  P and causal power are maximum likelihood estimates of the strength parameter w 1, under different parameterizations for P(E|B,C): – linear   P, Noisy-OR  causal power E B C w0w0 w1w1 B

46 Hypotheses: h 1 = h 0 = Bayesian causal inference: support = E B C E B C B B Causal structure P(d|h1)P(d|h1) P(d|h0)P(d|h0) likelihood ratio (Bayes factor) gives evidence in favor of h 1

47 People  P (r = 0.89) Power (r = 0.88) Support (r = 0.97) Buehner and Cheng (1997)

48 Generativity is essential Predictions result from a “ceiling effect”, which only matters if you believe a cause increases the probability of an effect (follows from use of Noisy-OR, after Cheng, 1997) P(e+|c+)P(e+|c+) P(e+|c-)P(e+|c-) 8/8 6/8 4/8 2/8 0/8 Bayesian 100 50 0

49 Three kinds of causal induction contingency data physical systems perceived causality

50 Blicket detector (Gopnik, Sobel, and colleagues) See this? It’s a blicket machine. Blickets make it go. Let’s put this one on the machine. Oooh, it’s a blicket!

51 –Two objects: A and B –Trial 1: A B on detector – detector active –Trial 2: A on detector – detector active –4-year-olds judge whether each object is a blicket A: a blicket (100% say yes) B: probably not a blicket (34% say yes) “Backwards blocking” (Sobel, Tenenbaum & Gopnik, 2004) AB Trial A Trial AB

52 Possible hypotheses E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA E BA

53 Bayesian inference Evaluating causal models in light of data: Inferring a particular causal relation:

54 Bayesian inference With a uniform prior on hypotheses, and the generic parameterization, integrating over parameters (Cooper & Herskovits, 1992) AB Probability of being a blicket 0.32 0.34

55 Theory-based causal induction Theory Bayesian inference X Y Z X Y Z X Y Z X Y Z Hypothesis space generates Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Data generates

56 An analogy to language Theory X Y Z X Y Z X Y Z X Y Z Hypothesis space generates Case X Y Z 1 1 0 1 2 0 1 1 3 1 1 1 4 0 0 0... Data generates Grammar Parse trees generates Sentence generates The quick brown fox …

57 Component of theory: Ontology Plausible relations Functional form Generates: Variables Structure Conditional probabilities A causal theory is a hypothesis space generator P(h|data)  P(data|h) P(h) Hypotheses are evaluated by Bayesian inference Theory-based causal induction

58 Ontology –Types: Block, Detector, Trial –Predicates: Contact(Block, Detector, Trial) Active(Detector, Trial) Theory E A B

59 Ontology –Types: Block, Detector, Trial –Predicates: Contact(Block, Detector, Trial) Active(Detector, Trial) Theory E A B A = 1 if Contact(block A, detector, trial), else 0 B = 1 if Contact(block B, detector, trial), else 0 E = 1 if Active(detector, trial), else 0

60 Plausible relations –For any Block b and Detector d, with prior probability q  : For all trials t, Contact(b,d,t)  Active(d,t) Theory h 00 : h 10 : h 01 : h 11 : E A B E A B E A B E A B P(h 00 ) = (1 – q) 2 P(h 10 ) = q(1 – q) P(h 01 ) = (1 – q) qP(h 11 ) = q 2 No hypotheses with E B, E A, A B, etc. = “A is a blicket” E A

61 Functional form of causal relations –Causes of Active(d,t) are independent mechanisms, with causal strengths w b. A background cause has strength w 0. Assume a deterministic mechanism: w b = 1, w 0 = 0. Theory P(E=1 | A=0, B=0): 0 0 0 0 P(E=1 | A=1, B=0): 0 0 1 1 P(E=1 | A=0, B=1): 0 1 0 1 P(E=1 | A=1, B=1): 0 1 1 1 E BA E BA E BA E BA P(h 00 ) = (1 – q) 2 P(h 10 ) = q(1 – q)P(h 01 ) = (1 – q) qP(h 11 ) = q 2

62 Ontology –Types: Block, Detector, Trial –Predicates: Contact(Block, Detector, Trial) Active(Detector, Trial) Plausible relations –For any Block b and Detector d, with prior probability q  : For all trials t, Contact(b,d,t)  Active(d,t) Functional form of causal relations –Causes of Active(d,t) are independent mechanisms, with causal strengths w i. A background cause has strength w 0. Assume a deterministic mechanism: w b = 1, w 0 = 0 Theory

63 Bayesian inference Evaluating causal models in light of data: Inferring a particular causal relation:

64 Modeling backwards blocking P(E=1 | A=0, B=0): 0 0 0 0 P(E=1 | A=1, B=0): 0 0 1 1 P(E=1 | A=0, B=1): 0 1 0 1 P(E=1 | A=1, B=1): 0 1 1 1 E BA E BA E BA E BA P(h 00 ) = (1 – q) 2 P(h 10 ) = q(1 – q)P(h 01 ) = (1 – q) qP(h 11 ) = q 2

65 P(E=1 | A=1, B=1): 0 1 1 1 E BA E BA E BA E BA P(h 00 ) = (1 – q) 2 P(h 10 ) = q(1 – q)P(h 01 ) = (1 – q) qP(h 11 ) = q 2 Modeling backwards blocking

66 P(E=1 | A=1, B=0): 0 1 1 P(E=1 | A=1, B=1): 1 1 1 E BA E BA E BA P(h 10 ) = q(1 – q)P(h 01 ) = (1 – q) qP(h 11 ) = q 2 Modeling backwards blocking

67 Three kinds of causal induction contingency data physical systems perceived causality

68 Michotte (1963) Affected by… –timing of events –velocity of balls –proximity

69 A simple theory Newton+Noise

70 Strategy for evaluating theory (Sanborn, Mansinghka, & Griffiths, 2009) Generate a set of collisions varying in velocity, timing, and proximity Ask people to judge as “real” or “random” fixed estimate P(d|real) from judgments of P(real|d)

71 Results Newton + Noise x x Newton noise “real” “random”

72 Conclusions Causal induction from contingency data is a well-studied problem, and one where learning structure and strength can provide insights Approaching causal induction as Bayesian inference with causal graphical models has the potential to explain many other phenomena: –learning from rates –learning complex causal structures –detecting coincidences –perceptual causality –inferences about physical causal systems –learning and reasoning about dynamic causal systems

73


Download ppt "Human causal induction Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley."

Similar presentations


Ads by Google