Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Reading from Multiple Texts Peter Clark and John Thompson Boeing Research and Technology.

Similar presentations


Presentation on theme: "Machine Reading from Multiple Texts Peter Clark and John Thompson Boeing Research and Technology."— Presentation transcript:

1 Machine Reading from Multiple Texts Peter Clark and John Thompson Boeing Research and Technology

2 What is Machine Reading?  Not (just) parsing, fact extraction  Construction of a coherent representation of the scene the text describes  Challenge: much of that representation is not in the text “A soldier was killed in a gun battle” The soldier died The soldier was shot There was a fight …

3 What we are trying to do:  Multiple text approach:  Reduce need for precision/coverage on individual texts  Assess confidence using redundancy  Exploit the vast amount of text available  Domains: 2 stroke engines, Pearl Harbor

4 What we’re trying to do: 2 Stroke engines Compress mixture Suck in fresh mixture Generate spark Ignite mixture Mixture explodes...the mixture of fuel and air in the cylinder has been compressed. This mixture ignites when the spark plug generates a spark. Igniting the mixture causes an explosion. The explosion forces the piston down.......The piston compresses the air-fuel mixture in the combustion chamber. The vacuum in the crankcase sucks a fresh mixture of air-fuel-oil into the cylinder. The spark from the spark plug begins the combustion stroke… Multiple Input Texts Output: Single, Coherent Representation

5 What we’re trying to do: Pearl Harbor Japanese planes take off Planes fly to Pearl Harbor Planes bomb airfields & ships Eight battleships damaged Planes return …at 6am, the first attack wave of 183 Japanese planes takes off from the carriers and heads for Pearl Harbor. At 7:53am the first Japanese assault wave commences the attack, targeting airfields and battleships. Eight battleships are damaged, with five sunk… As the sun was just beginning to rise, a fleet of Japanese forces were taking off from carriers in various locations in the Pacific. At 7:55am, just as many islanders were waking up for breakfast, the first Japanese bomb was dropped on Wheeler Field, eight miles from Pearl Harbor….Most planes returned to their carriers intact… Multiple Input Texts Output: Single, Coherent Representation

6 Incredibly Challenging  Basic language processing is hard → Need high-quality language engine  Multiple alignments and implications of text → Treat reading as model building, not fact extraction  Multiple viewpoints/perspectives  → Knowledge-guided model extraction process

7 Basic language processing is hard  Usual suspects: syntax, WSD, SRL, LF, NE, …  Discourse structure contains much implicit knowledge (e.g., parts, event ordering) A two-stroke engine's combustion stroke occurs when the spark plug fires. At the beginning of the combustion stroke, the mixture of fuel and air in the cylinder has been compressed. This mixture ignites when the spark plug generates a spark. Igniting the mixture causes an explosion. The explosion forces the piston down. The piston compresses the mixture in the crankcase as it moves down. As the piston approaches the bottom of its stroke, the exhaust port is uncovered. The pressure in the cylinder forces exhaust gases out of the cylinder. As the piston reaches the bottom of the cylinder, the intake port is uncovered. The piston's movement pressurizes the mixture in the crankcase. The mixture displaces the burned gases in the cylinder.

8 Incredibly Challenging  Basic language processing is hard → Need high-quality language engine  Multiple alignments and implications of text → Treat reading as model building, not fact extraction  Multiple viewpoints/perspectives  → Knowledge-guided model extraction process

9 Want:

10 Finding Equivalences, Entailments, and Matches...the mixture of fuel and air in the cylinder is compressed. This mixture ignites when the spark plug generates a spark. Igniting the mixture causes an explosion…....The piston compresses the air-fuel mixture in the combustion chamber. The vacuum in the crankcase sucks a fresh mixture of air-fuel-oil into the cylinder. The spark from the spark plug begins the combustion stroke…  Basic operation: relating (then integrating) texts

11 Finding Equivalences, Entailments, and Matches...the mixture of fuel and air in the cylinder is compressed. This mixture ignites when the spark plug generates a spark. Igniting the mixture causes an explosion…....The piston compresses the air-fuel mixture in the combustion chamber. The vacuum in the crankcase sucks a fresh mixture of air-fuel-oil into the cylinder. The spark from the spark plug begins the combustion stroke… T

12 Finding Equivalences, Entailments, and Matches...the mixture of fuel and air in the cylinder is compressed. This mixture ignites when the spark plug generates a spark. Igniting the mixture causes an explosion…....The piston compresses the air-fuel mixture in the combustion chamber. The vacuum in the crankcase sucks a fresh mixture of air-fuel-oil into the cylinder. The spark from the spark plug begins the combustion stroke… H T  ? → ? = ? ←? Textual “Entailment” = The “Modus Ponens” of NLU

13 Recognizing Textual Entailment (RTE)  Task: does H “reasonably” follow from T?  (or: what is the relationship between T and H?)  Annual RTE competition for 4 years  Is very difficult, and largely unsolved still  typical scores ~50%-70% (baseline is 50%)  RTE4 (2008): Mean score was 57.5% T: The piston's movement pressurizes the mixture. H: The piston compresses the mixture.

14 Examples T: The piston's movement pressurizes the mixture. H: The piston compresses the mixture. T: A 1,760 pound armor-piercing shell slammed through the deck and hit the ship’s forward ammunition magazine. H: A 1,760 pound bomb penetrated into the front of the ship. A few are easy(ish)…. but most are difficult…

15 Boeing’s RTE System 1.Interpret texts using BLUE  (Boeing Language Understanding Engine) 2.See if:  H subsumes (is implied by) T  H:“An animal eats a mouse” ← T:“A black cat eats a mouse”  H subsumes an elaboration of T  H:“An animal digests a mouse” ← T:“A black cat eats a mouse”  via IF X eats Y THEN X digests Y Two sources of World Knowledge  WordNet subsumption and part of speech relations  DIRT paraphrases

16 BLUE’s Pipeline (DECL ((VAR _X1 "the" "mixture") (VAR _X2 NIL (S (ING) NIL "ignite" _X1)) (VAR _X3 "an" "explosion")) (S (PRESENT) _X2 "cause" _X3)) "mixture"(mixture01), "ignite"(ignite01), sobject(ignite01,mixture01), "explosion"(explosion01), "cause"(cause01), subject(cause01,ignite01), sobject(cause01,explosion01). “Igniting the mixture causes an explosion.” Parse + Logical form Initial Logic isa(mixture01,mixture_n1), isa(ignite01,light_v4), isa(explosion01,explosion_n1), causes(ignite01,explosion01), object(ignite01,mixture01). Final Logic

17 “Lexico-semantic inference”  Subsumption subject(eat01,cat01), object(eat01,mouse01), mod(cat01,black01) “by”(eat01,animal01), object(eat01,mouse01) T: A black cat ate a mouse H: A mouse was eaten by an animal

18 With Inference… T: A black cat ate a mouse IF X isa cat THEN X has a tailIF X eats Y THEN X digests Y T’: A black cat ate a mouse. The cat has a tail. The cat digests the mouse. The cat chewed the mouse. The cat is furry. ….

19 With Inference… T: A black cat ate a mouse IF X eats Y THEN X digests Y T’: A black cat ate a mouse. The cat has a tail. The cat digests the mouse. The cat chewed the mouse. The cat is furry. …. H: An animal digested the mouse. Subsumes IF X isa cat THEN X has a tail

20 Acquiring paraphrase/inference rules  Where do the rules come from?  paraphrasing technology can learn these, e.g., DIRT IF X loves Y THEN X likes Y X loves Y table chair bed cat dog Fred Sue person word freq X table chair bed cat dog Fred Sue person word freq Y table chair bed cat dog Fred Sue person word freq X table chair bed cat dog Fred Sue person word freq Y X falls to Y ?

21 Acquiring paraphrase/inference rules  Where do the rules come from?  paraphrasing technology can learn these, e.g., DIRT IF X loves Y THEN X likes Y X loves Y table chair bed cat dog Fred Sue person word freq X table chair bed cat dog Fred Sue person word freq Y table chair bed cat dog Fred Sue person word freq X table chair bed cat dog Fred Sue person word freq Y X falls to Y

22 Acquiring paraphrase/inference rules  Where do the rules come from?  paraphrasing technology can learn these, e.g., DIRT IF X loves Y THEN X likes Y X loves Y table chair bed cat dog Fred Sue person word freq X table chair bed cat dog Fred Sue person word freq Y X likes Y table chair bed cat dog Fred Sue person freq X table chair bed cat dog Fred Sue person freq Y ?

23 Acquiring paraphrase/inference rules  Where do the rules come from?  paraphrasing technology can learn these, e.g., DIRT IF X loves Y THEN X likes Y X loves Y table chair bed cat dog Fred Sue person word freq X table chair bed cat dog Fred Sue person word freq Y X likes Y table chair bed cat dog Fred Sue person freq X table chair bed cat dog Fred Sue person freq Y

24 Some selected paraphrases from DIRT IF Sergei organizes a symposium THEN: Sergei promotes a symposium. Sergei participates in a symposium. Sergei makes preparations for a symposium.  Sergei intensifies a symposium. Sergei denounces a symposium. Sergei urges a boycott of a symposium.

25 Good Entailments and Alignments …Burned gases flow out of the cylinder through the exhaust port…....The pressure in the cylinder displaces the burned gases from cylinder…. (DIRT) IF Y is displaced from X THEN Y pours out of X the burned gases pour out of the cylinder (WordNet)

26 Good Entailments and Alignments...The piston compresses the mixture in the crankcase…. …The piston’s movement pressurizes the mixture in the crankcase…. (DIRT) IF X’s movement changes Y THEN X changes Y the piston pressurizes the mixture (WordNet)

27 Bad Entailments The air-fuel mixture goes into the cylinder as the piston moves…....The burned air-fuel mixture exits the cylinder through the exhaust port… (DIRT) IF X exits Y THEN X squeezes into Y the mixture squeezes into the cylinder (WordNet) 

28 Other entailments …….... Following the explosion, the exploding gases push the piston, forcing it down the cylinder… the piston is moved down by the gases the gases pull the piston  the piston militates against the gases  the gases drive the piston

29 The Bottom Line  Simply finding local alignments, and computing local implications, is not enough  Machine-learned world knowledge is too noisy  Local decisions are unacceptably error-prone  Reading is not (just) a set of local processes  Rather: Also need a “global” aspect: Machine Reading = a process of model formation  a search for a “most coherent” set of facts

30 The exploding gases push the piston down the cylinder… The gases push the piston down. The explosion of the gases drive the piston… The gases are moved by the piston. The gases drive the piston. The gases propel the piston. The gases race the piston. The gases pull the piston. The gases push the piston. Text InterpretationEntailments

31 The exploding gases push the piston down the cylinder… The gases push the piston down. The explosion of the gases drive the piston… The gases are moved by the piston. The gases drive the piston. The gases propel the piston. The gases race the piston. The gases pull the piston. The gases push the piston. Text InterpretationEntailments

32 The exploding gases push the piston down the cylinder… The gases push the piston down. The explosion of the gases drive the piston… The gases are moved by the piston. The gases drive the piston. The gases propel the piston. The gases race the piston. The gases pull the piston. The gases push the piston. Best, consistent subset of elaborations = Overall, integrated theory

33 Is a Markov-based search process:  Can transform this to a satisfiability problem… Propositions: P1: gases push piston down P2: gases drive piston P3: gases pull piston P4: gases propel piston Weights: ∞ 10 8 10 ∞ Formulae: P1 P2 P1 → P3 P1 → P4 P2 → P4 not P1 & P3 Given fact → DIRT rule → Inconsistent → facts can’t both hold “Things we’d like to be true”  Maximize (weighted) number of happy (satisfied) formulae!

34 Is a Markov-based search process:  Can transform this to a satisfiability problem… Propositions: P1: gases push piston down P2: gases drive piston P3: gases pull piston P4: gases propel piston Weights: ∞ 10 8 10 ∞ Best assignment: t f t Results in: t f t Formulae: P1 P2 P1 → P3 P1 → P4 P2 → P4 not P1 & P3 Given fact → DIRT rule → Inconsistent → facts can’t both hold “Things we’d like to be true”  Maximize (weighted) number of happy (satisfied) formulae!

35 Incredibly Challenging  Basic language processing is hard → Need high-quality language engine  Many possible equivalences and implications → Treat reading as model building, not fact extraction  Multiple viewpoints/perspectives  → Knowledge-guided model extraction process

36 Want:

37 Got:  Do have coherent, supported facts  BUT:  There’s a lot going on in any scene!  Multiple viewpoints and levels of detail Fuchida shouted “Tora! Tora!” The ships reached position. It was a sunny day. The attack was audacious.

38  Better: Use world knowledge to guide what to look for  e.g., scripts of generalized event sequences Expectations/ Scripts

39 Expectations/ Scripts (Entailment-like reasoning again!)

40 System can still make mistakes… ….Japanese submarines attacked Pearl Harbor… ….torpedoes attacked Pearl Harbor… ….bombers attacked Pearl Harbor… sandwich#n2: “submarine” “hoagie” “torpedo” “sandwich” “poor boy”, “bomber”: a large sandwich made with meat and cheese WordNet Pearl Harbor is being attacked by sandwiches (!)

41 Summary  Machine Reading from multiple texts  tolerate gaps, ambiguity, errors through redundancy  Three critical requirements  High-quality language engine  Reading as model building, not fact extraction  entailment technology as “modus ponens” of NLU  search for coherence to overcome (many) local errors  Knowledge-guided model extraction process  expectations to guide what to look for  Implications of success are huge!


Download ppt "Machine Reading from Multiple Texts Peter Clark and John Thompson Boeing Research and Technology."

Similar presentations


Ads by Google